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Preface 


The book Algorithms as a Basis of Modern Applied Mathematics is part of the series 
“Studies in Fuzziness and Soft Computing” published by Springer. This volume 
is the outcome of scientific collaboration in the areas of Mathematics and Statistics 
among Prof. Sarka HoSkova-Mayerova from the University of Defence (Brno, Czech 
Republic), Prof. Cristina Flaut from the “Ovidius” University (Constanta, Romania), 
and Dr. Fabrizio Maturo from the University of Campania “Luigi Vanvitelli” (Caserta, 
Italy). 

Due to recent technological developments, the widespread distribution of 
computers, telephones, and devices of different kinds that keep people always 
connected to the World Wide Web, the algorithms have increased their importance. 
It is undeniable that we live in the age of algorithms; these dominate our everyday 
life, process our data, give us suggestions, guide us more or less consciously during 
many processes of daily life. Algorithms play a fundamental role due to their ability 
of simplification and synthesis of the great mass of data that technology makes avail- 
able. From the simplest to the more complicated one, it is impossible to deny the 
importance of algorithms, both in our daily and online life. However, algorithms 
have a double face; if on the one hand, they are useful and vital, on the other, they 
can also be dangerous because they have the power to influence people, for example, 
the Google PageRank algorithm, which decides in which order the search results 
are displayed. It can affect our decisions, our purchases, which University to attend, 
whether to watch a movie or not, etc. Its power and danger are undeniable and the 
power of those who control it is equally undeniable. Equally relevant is the impor- 
tance of algorithms in financial systems that move billions of dollars every day. They 
can buy and sell securities in milliseconds. Then, there is the “you might also like 
it” in every e-commerce website, or similarly, the Google AdSense algorithm that 
monitors our behaviour, time on sites, the use of words, and search queries to provide 
contextual advertising. On the face of it, most of these algorithms do nothing wrong: 
they select and advise. The risk, however, is people become “vulnerable” to manip- 
ulation and propaganda because the information is “supervised”. Recently, Google 
has developed a new algorithm, called Knowledge-Based Trust, which rewards or 


vi Preface 


penalises websites based on the reliability of the content they publish. Hence, algo- 
rithms can suggest, guide, or even censor. In summary, in many cases, algorithms 
decide for us. 

Given the growing importance of algorithms, the main objective of this volume is 
to collect some contributions regarding them in various research areas and to present 
them organically. 

The book opens with the contribution entitled “Numerical Computation in Spiky 
Domain” authored by Sen and Agarwal. The authors discuss this uniform accuracy 
issue or the best available order of accuracy and attempt to prescribe remedies for 
several numerical problems such as those involving/near-singular linear systems, zero 
clusters of a nonlinear univariate equation, optimisation, and differential equations 
involving singularity. 

The second chapter is “Modeling the Relationship Between Energy, Carbon 
and Water Efficiency of the Most Sustainable Corporations in the World” by 
Barbulescu and Mascu. The authors use an integrated approach to evaluate how 
green businesses operate, and they analyse the relationship between energy, carbon 
and water based on the data from the top sustainable companies in the world. They 
propose two models to study the dependence of carbon on energy efficiency and 
carbon on energy and water efficiency. 

In the third chapter, namely “Multi-objective Evolutionary Algorithms: Decom- 
position Versus Indicator-Based Approach”, Ionescu compares the performance of 
four state-of-the-art multi-objective evolutionary algorithms representing two funda- 
mental approaches in recent research: indicator-based and decomposition-based. The 
algorithms are compared with respect to six convergence and diversity performance 
measures, including the recently proposed IGD+, for benchmark test problems in 
two popular suites. 

Crasmareanu is the author of the contribution “The Hopf-Levi-Civita Data 
of Two-Dimensional Metrics”. The paper aims to study three two-dimensional 
geometries, namely Riemannian, Kahler, and Hessian, in a unitary way by using 
three (local) Hermitian matrices. 

Flaut, Savin, and Zaharia, in the chapter “Some Applications of Fibonacci 
and Lucas Numbers”, provide new applications of Fibonacci and Lucas numbers. 
In some circumstances, they find algebraic structures on some sets defined with 
these numbers, and they also generalise Fibonacci and Lucas numbers by using an 
arbitrary binary relation over the real fields instead of addition of the real numbers. 
The authors give properties of the obtained sequences and, by using some relations 
between Fibonacci and Lucas numbers, they provide a method to find examples of 
split quaternion algebras. 

Flaut, HoSkova-Mayerova, and Vasile, in their contribution “Some Remarks 
Regarding Finite Bounded Commutative BCK-Algebras”, provide some examples 
of finite bounded commutative BCK-algebras, using the Wajsberg algebra associ- 
ated with a bounded commutative BCK-algebra. This method is an alternative to 
the Iseki’s construction since by Iseki’s extension, some properties of the obtained 
algebras are lost. 
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The chapter “Revisiting Fixed Point Results with a Contractive Iterative 
at a Point”, by Karapinar, deals with the discussion on fixed point results with a 
contractive iterative at a point in the setting of various abstract space. The scope is 
to collect the corresponding necessary results on the topic in the literature and to 
combine and connect several existing results in this direction by generalising the 
famous theorem of Matkowski. 

Codarcea-Munteanu, and Marin are the authors of the paper “An Algorithmic 
Perspective on the Thermoelasticity of the Micromorphic Materials Using Fractional 
Order Strain”. The chapter deals with the approach to the micromorphic material 
linear thermoelasticity. The purposes are, first, to present an algorithmic perspective 
in obtaining the constitutive equations of this linear theory and the non-Fourier heat 
equation form; and second, to get a reciprocal relation, by using the Caputo fractional 
derivative. This algorithmic perspective allows the creation of a technical scheme that 
can be followed in determining the fundamental relations of linear thermoelasticity 
of micromorphic materials, providing a model that can be used to determine these 
equations for other materials. 

The investigation titled “Numerical Algorithms in Mechanics of Generalized 
Continua’, by Chirilé and Marin, analyses from a numerical point of view the solution 
of the system of partial differential equations arising in the theory of isotropic dipolar 
thermoelasticity with double porosity. The authors write the variational formulation 
and introduce fully discrete approximations by using the finite element method to 
approximate the spatial variable and the backward Euler scheme to discretize the 
first-order time derivatives. By using these algorithms, numerical simulations show 
the behaviour of the solution. 

The research “A Practical Review on Intrusion Detection Systems by Known 
Data Mining Methods” is authored by Ghasem and Rafsanjani. The authors propose 
methods for intrusion detection in computer networks and review some classification 
and clustering methods. 

Bagherinezhad and Rafsanjani, in their study “Leaf Recognition Based on Image 
Processing”, deal with methods for automatic detection of plant species using image 
processing. They also present the problems and limitations of this research area as a 
future work guideline. 

The work presented by Gryshchuk and Plaksa is titled “A Hypercomplex Method 
for Solving Boundary Value Problems for Biharmonic Functions”. The authors 
develop a hypercomplex method for solving boundary value problems for bihar- 
monic functions. This method is based on a relation between biharmonic functions 
and monogenic functions taking values in a commutative algebra associated with the 
biharmonic equation. 

The paper “On Algorithms Related to Expressibility of Functions of Diagonal- 
izable Algebras” by Rusu and Rusu deals with the issue of expressibility of func- 
tions and related algorithmic problems. They propose and analyse algorithms for 
detecting functional expressibility of functions of a 4-valued diagonalizable algebra 
and examine other issues connected to iterative algebras based on diagonalizable 
algebras. 
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In this study “An Algorithm for Solving the Sylvester s-Conjugate Elliptic Quater- 
nion Matrix Equations”, Tosun and K6sal investigate the existence of the solution to 
Sylvester s-conjugate elliptic quaternion matrix equations. A solution is obtained in 
an explicit form by means of a real representation of an elliptic quaternion matrix. 
Moreover, a pseudo-code for our method is presented. 

The academic work “Deterministic Actions on Stochastic Ensembles of Particles 
Can Replicate Wavelike Behaviour of Quantum Mechanics: Does It Matter?” by 
Hodgson and Kisil analyses the probabilistic interpretation of quantum mechanics. 
The authors propose a synthesis of ideas that appeared from the field’s founders 
to modern contextual approaches. The concept is illustrated by a simple numerical 
experiment imitating photon interference in two beam splitters. This example demon- 
strates that deterministic physical principles can replicate wavelike probabilistic 
effects when applied to stochastic ensembles of particles. 

The chapter “Modeling of Selected Physical Phenomena Under Explosion 
Welding the Laminate AA2519-Ti6Al4V” authored by Turchyn, Pasternak, Sniezek, 
Szachogluchowicz, Hutsaylyuk, and Sulym presents analytical research of the joining 
process of the laminate layers produced by the explosive method. The proposed model 
allows determining the influence of connection parameters on the shape and quality 
of the connection zone. The analytical research uses the theory of the elastodynamic 
character of material deformation at the connection point due to local traction load. 

In the chapter “Modelling of the Stress-Strain State of a Viscoelastic Rectangular 
Plate in Condition Combine Load” by Pasternak, Turchyn, Hutsaylyuk, and Sulym, 
the dynamic problem of linear elastic theory for a thin rectangular plate, loaded on 
opposite sides with evenly distributed stresses, has been solved using the methods 
of solving dynamic problems for inhomogeneous centres and boundary elements 
for the study of geometrically complex and physically heterogeneous and nonlinear 
finite-dimensional bodies. 

Pottéek, in his chapter titled “Formulas for the Sums of the Series of Reciprocals 
of the Polynomial of Degree Two with Non-zero Integer Roots”, deals with the 
sums of the series of reciprocals of the quadratic polynomials with non-zero integer 
roots. It is a follow-up and completion of the previous author’s papers dealing with 
the sums of these series, where the quadratic polynomials have all possible types 
of positive and negative integer roots. First, the conditions for the coefficients of a 
reduced quadratic equation are given so that this equation has only integer roots. 
Further, summary formulas for the sums of the series of reciprocals of the quadratic 
polynomials with non-zero integer roots are stated and derived. These formulas are 
verified by some examples using the basic programming language of the computer 
algebra system Maple. 

The contribution “Life Expectancy at Birth and Its Socioeconomic Determinants: 
An Application of Random Forest Algorithm”, by Martin Cervantes, Rueda Lopez, 
and Cruz Rambaud, studies the socioeconomic determinants of life expectancy at 
birth in European countries from 2000 to 2017. The authors adopt not only “tra- 
ditional” variables such as per capita income and education but also sociocultural 
differences and public expenditure on social protection. The methodology employed 
in this chapter is the random forest algorithm to measure the importance of certain 
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variables which are related to life expectancy at birth. The findings conclude that the 
variables with the greatest relative importance in explaining life expectancy at birth 
are per capita income, the “Area”, the educational level, and public expenditure on 
social protection. 

Ventre, Longo, and Maturo, in the chapter “The Role of the Communication 
and Information in Decision-Making Problems’, introduce the issues of rational 
models in decision processes. The authors discuss the importance of the level of 
information of each problem solver in causing inconsistent preferences in a decision 
process (both in cooperative decision problems and in non-cooperative ones). They 
show that only in particular cases the optimum and rational decision matches with the 
real one influenced by different kinds of emotions. Furthermore, behavioural biases 
influence also the modeller, so how the decision model is constructed and presented 
is different depending on the modellers, and these differences also can influence the 
final choice of each agent. 

Basaran and Simonetti present their contribution titled “Parameter Estimation 
of Fuzzy Linear Regression Utilizing Fuzzy Arithmetic and Fuzzy Inverse Matrix” 
where they deal with parameter estimation methods of fuzzy linear models and 
propose a novel method consisting of new notions and calculation procedures. 

The chapter “Integrated Approach to Risk Management for Public Administra- 
tions: The Healthcare System”, authored by Fattoruso, Jannelli, Olivieri, and Squil- 
lante, is about the problem of systemic risk. It represents the risk of a profound 
transformation of the structure of an economic and/or financial system that occurs, 
through a chain reaction, after the manifestation of a triggering event. The objective 
of the paper is to create a function that has a predictive character in the definition 
of the company crises. This function will be defined starting from the identification, 
the mapping, and the coverage of the risk. 

The chapter “A Memetic Algorithm for Solving the Rank Aggregation Problem” 
by Acampora, Iorio, Pandolfo, Siciliano, and Vitiello deals with the rank aggregation 
problem. The main issue, in this setting, is to find a ranking that best represents the 
synthesis of a group of judges. The authors propose to use, for the first time, a memetic 
algorithm to address this complex problem. The proposed memetic algorithm extends 
genetic algorithms with a hill-climbing search. As shown in an experimental session 
involving well-known datasets, the proposed algorithm outperforms the evolutionary 
state-of-the-art approaches. 

The chapter titled “Mathematical Optimization and Application of Nonlinear 
Programming” by Vagaské and Gombar deals with mathematical optimization, 
focusing on the special role of convex optimization and nonlinear optimization for 
solving optimization problems. To solve optimization problems, nonlinear program- 
ming in Matlab is used and selected optimization methods and mathematical models 
are discussed. 

The chapter titled “Applicability of Selected Mathematical Statistics Methods 
During Decision Activities Within Joint CBRN Defence COE” deals with an appli- 
cation of statistic methods in a command and control process that enables to find 
possible solutions of the problems we can meet in the area of NATO with. 
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The purpose of the last paper is to account the number of fuzzy subgroups of an 
abelian group “G = Zp» x Zp”, where p is a prime number, and n, m are natural 
numbers, up to isomorphism. Based on the observation of the results and on the 
amount of data pertaining to this work, which has been checked by means of a 
computer program, a combinatorial formula for the number of fuzzy subgroups of 
G was conjectured. 

In summary, the book Algorithms as an approach applied mathematics is 
addressed, in equal measure, to Mathematicians, Statisticians, and more generally to 
scholars and specialists of different fields of research where algorithms play a crucial 
role. 


Brno, Czech Republic Sarka Hoskové-Mayerova 
Constanta, Romania Cristina Flaut 
Caserta, Italy Fabrizio Maturo 
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Syamal K. Sen and Ravi P. Agarwal 


Abstract In a widely used fixed precision environment a uniform best possible 
accuracy is desired throughout the concerned domain. In this environment, generally 
the accuracy of numerical computation is good in some regions of the domain while 
in some other regions it may not be so good or even may be considered bad. Special 
care is needed to tackle the low accuracy in the later regions so that the overall 
accuracy of the domain is improved and is comparable with the accuracy in other good 
or, equivalently, non-spiky domain. It is possible that even after improving the low 
accuracies to best possible ones in the given context/fixed precision environment, this 
improved accuracies may not be as good as or comparable with good ones elsewhere. 
However, the order of accuracy of the whole domain will be considered the lowest 
order of accuracy existing in some region(s) of the domain. Neither the best accuracy 
nor the average accuracy inside the domain will be termed as the accuracy of the 
domain under consideration. The goal is to have the order as uniform as possible 
throughout. We discuss this uniform accuracy issue or the best attainable order of 
accuracy and attempt to prescribe remedies for several numerical problems such as 
those involving near-singular linear systems, zero clusters of a non-linear univariate 
equation, optimization, and differential equations having a singularity. 
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1 Introduction 


Spiky domain In mathematics the domain of a function is the complete set of possible 
values of the independent variables. The numerical computation at all of the set of 
values may not have the same or uniform accuracy of the computed function. 

If the accuracy is very much low at some of the sets compared to other sets, then 
these sets constitute the spiky or, equivalently, thorny regions/points of the domain. 
The function computation at these points involve significantly more error and hence 
these points may also be referred to as bouncy or humpy. 

A general program (algorithm) needs to be appropriately modified so that the 
accuracy at the spiky points is consciously maximized/improved. Or, in other words, 
the accuracy of the computation of the function throughout the domain is made as 
uniform as possible. 

When we talk about the accuracy of numerical computation over a domain, the 
lowest order of accuracy will be considered as the overall accuracy of the computa- 
tion. Neither the average accuracy nor the highest (best) accuracy will be regarded 
as the accuracy over the domain. 

It is possible that in some regions/points of the domain, the function might 
not be numerically computable with an accuracy which is acceptably good for a 
specified precision (word length). Then these points may be termed as computa- 
tional/numerical singular points with respect to the specified precision. These may 
be often mathematical singular points also. 

Need to compute the computational error and complexity Whenever one does 
a numerical computation, it is imperative that he gets a logical estimate of the 
quality (implying the computational error or, reciprocally, computational accu- 
racy) of solution throughout the domain and the cost (implying the computational 
complexity or, equivalently, amount/time of computation) of the solution involved. 

Not only for their satisfaction but also for the satisfaction of others, it is neces- 
sary that these two parameters, viz., the quality and the cost are specified with any 
computation that one carries out. 

Unfortunately these 2 parameters are often not computed by most people in a 
formal way. Such people accept the solution if they like it in the context. 

When these 2 parameters are obtained/computed along with the solution, anybody 
will be logically as well as scientifically convinced about the quality and the cost of 
the solution. They will also be able to place/compare their computational algorithm 
compared with other existing similar algorithms (for the purpose). Maybe, they can 
even improve their algorithm based on the knowledge of the pros and cons of the 
existing ones. 

They will be able to place full confidence on the computation. Just doing compu- 
tation and getting solution without justifying the cost and quality will result in a 
shaky feeling without much/any confidence about the solution. 

Even in a simplest case such as solving the linear system Ax = b, a numerical 
solution x, computed using a fixed precision will not satisfy Ax = b exactly in 
general, i.e. Ax, # b. It is necessary to compute the concerned relative error to 
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determine the quality of the computed solution x,. The problem of knowing the 
quality and even correctness of the computed solution x, to ensure the acceptability 
of the solution in the context is more involved for other problems such as linear 
optimization as well as nonlinear optimization problems. 

Likewise, for a univariate equation f(x) = 0, where f(x) is a polynomial or a 
transcendental function, the scalar solution x, will not satisfy the equation exactly 
in general, i.e., f(x.) 4 0. 

Sometimes, wherever experimental verification(s) are possible/available, the 
experimentalist/researcher is satisfied whenever the computed solution matches with 
the experimental output. Definitely such a matching improves one’s confidence on 
the correctness of the computation. Nevertheless it is still imperative to scientifically 
explore both computational accuracy and computational/time complexity. The habit 
of such an exploration on the part of the concerned researcher could be a boon for 
them in terms of sufficiently minimizing the probability of a failure in a real world 
environment. 

Surgical bombing to destroy a terrorist camp, accurate landing of a satellite on a 
specified location of the red planet Mars, and destroying a menacingly approaching 
asteroid hitting it accurately by a missile from the earth to save lives of millions of 
people are some of the vital outcomes of truly sufficiently accurate computations in 
real time. 

In NP-hard problems such as the traveling salesman problem (TSP) of reason- 
able size (say, 50 cities), knowing the quality of a computed solution (computed 
stochastically using a simulated annealing or an ant approach or a genetic algorithm) 
deterministically is impossible since the TSP is intractable both for finding the best 
solution (shortest path) and for verifying whether a computed solution really repre- 
sents the shortest path. We can get some statistical estimate about how good a solution 
of the TSP is [1-4]. 

However, whenever there is a scope of experiment in a real-world situation or 
there is a scope of knowing real-world outcome, one may verify the solution of the 
mathematical model (problem) against the numerical solution. 

Sometimes, however, there may not be a scope of carrying out repeated exper- 
iments because of prohibitive cost (involved in each experiment) such as the cost 
involved in hitting a distant target using several missiles each having a different 
elevation and a speed enabling us to determine the elevation and speed parameters 
that succeed hitting the target appropriately. 

An experiment may give them some satisfaction and allow them to rely on the 
solution. Such a satisfaction is definitely necessary and good. But this satisfaction 
is not the same as or in par with that due to the knowledge of computational error 
and computational complexity. While the latter provides the full conviction and 
satisfaction, the former does not. 

For a real world application e.g. for a surgical strike to destroy a specific 
fixed target, the related computation—often real time computations—must be of 
a convincing quality (sufficiently good accuracy) that the strike does not do any 
harm to unintended targets. 
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As yet another example, we may have an inactive artificial heavy satellite heading 
toward the earth. This is considered a potentially dangerous object that has the capa- 
bility to destroy a densely populated area of a city somewhere on the earth. The dead 
satellite needs to be either guided to land in a safe area of an ocean or destroyed using 
a missile in mid-air/space so that no harm is caused to life and property on earth. The 
concerned real time computation should be sufficiently convincingly accurate with 
time constraint and with adequate speed and direction of the missile. 

It may not be out of context to narrate the following true incidence. The code 
name given to the military operation to intercept and destroy a non-functioning U.S. 
National Reconnaissance Office (NRO) satellite named USA-193 was Operation 
Burnt Frost (OBF). The launch occurred on February 20, 2008 at about 10:26 p.m. 
EST from the USS Lake Erie. A Standard Missile-3 (SM-3) was used to shoot down 
the satellite. 

Only a few minutes after launch, the SM-3 intercepted its target and successfully 
completed its mission, by neutralizing the potential dangers which the errant satellite 
originally imposed. While the threat was mitigated, OBF has received much scrutiny 
from other countries, primarily from China and Russia. 

Uniform order of accuracy required throughout under standard fixed precision 
In our undergraduate physics laboratory, we do determine the diameter of a given 
sphere using a tool such as the slide (vernier) caliper or the screw gauge. Associated 
with each of these tools, there is a fixed order of error or, reciprocally, a fixed order of 
accuracy. We take several measurements of the diameter and then take the average to 
reduce the measurement error [5] and produce the numerical value of the diameter. 

An ordinary 6 in. (152.4 mm) digital caliper, for instance, is made of stainless 
steel and has a rated accuracy of 0.001 in. (0.0254 mm) and a resolution of 0.0005 
in. (0.0127 mm). 

The order of error associated with all the measurements is the same as the tool 
used is the same for all measurements. Further the errors in these measurements are 
statistically normally distributed and hence the average will produce most likelihood 
value of the diameter. Observe that the exact diameter is never known. Only thing 
that we know is a narrow (or, possibly a narrowest) interval (based on the specified 
measuring device) in which the exact diameter lies. 

If the order of error in an instrument/measuring device is different in different 
measurement then this mismatch will not allow us to justify the diameter as the most 
likelihood value of the diameter. Fortunately, this is not the situation in a real-world 
environment since any measuring device has always a fixed order of relative error, 
say 0.5 x 10! [5]. Thus we see that uniform or same order of error is important and 
required throughout. 

In numerical computation, we use a fixed precision of computation throughout 
all the regions constituting the domain. In some regions, the accuracy is very 
good/excellent. In some other regions, it is good while, in the rest, it is bad. One 
way is to adopt different precision of computation for different regions—appro- 
priate higher precision for bad regions, high precision for not so bad regions, and 
ordinary precision for good/excellent regions. This will ensure uniform accuracy of 
computation/output (except at the numerically singular points of a region). 
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But, in general, such multi-precision computations for solving a particular numer- 
ical problem is not available or certainly not automatically available for the user. 
Practically no computer has automatic precision change for executing a numerical 
algorithm according to the need of the uniform accuracy throughout the complete 
domain. A software/firmware implementation of automatic precision change is quite 
involved and is normally not done in a computer. 

However, the standard 15 digit precision provided by Matlab is sufficiently good 
for practically all real-world problems in all regions of a domain including the spiky 
regions (excluding, of course, the numerical/computational singular points). 

We usually need for final engineering implementation 4 significant digit accu- 
racy since practically no measuring device has an order of accuracy greater than 
4 significant digits. Although we desire uniform accuracy throughout the specified 
domain—ideally 15 digits—, it is enough to ensure 4 significant digit accuracy 
throughout the domain. 

Hence it is assumed that we do have a fixed reasonably good precision of compu- 
tation, say, 15 digits as is provided by Matlab—a most widely used highest level 
programming language for scientific and engineering computations. Based on this 
assumption we present a few problems, called ill-conditioned problems, in the areas 
of linear algebra/optimization, polynomial root-clusters, and ordinary differential 
equations, for which computations face pronounced errors in some of the regions, 
called the thorny regions [6]. 

Having a fixed precision computation throughout the execution of the algorithm, 
it is possible that in well-conditioned (good) regions, the algorithm will produce 
results with high order of accuracy, say 8 significant digits. In the ill-conditioned 
(bad) regions, it will produce results with low order of accuracy, say, 3 significant 
digits. Thus the overall order of accuracy for the whole result will be considered only 
3 significant digits not only for further computation based on these results but also 
for any engineering implementation. 

Itcan be seen that an arithmetic operation, for instance, the operation involving two 
operands—one having a precision of 8 significant digits and the other 3 significant 
digits—outputs the result that will have no more than 3 significant digit accuracy. The 
engineers and scientists who are only interested in getting some acceptable numerical 
results may have no interest or time to learn and write program(s) to achieve uniform 
accuracy all over the domain under a (fixed) standard precision. 

Ideally they should ensure the lowest precision of 4 significant digits is 
achieved throughout the domain excluding, however, computational singular points. 
This points must be excluded from any implementation—engineering and non- 
engineering such as experimental. 

The uniformity of accuracy over the whole domain is more relevant while solving 
differential equations having singularities. Close to singular points of the domain, 
the order of accuracy tends to drop sharply while it will maintain reasonably high 
accuracy elsewhere. So far as solving a linear system Ax = b is concerned, the 
uniformity of accuracy for the system for different right-hand-side vectors will remain 
invariant unlike that in solutions of differential equations, for instance, a system of 
ordinary differential equations with a singularity. 
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In Sect. 2, we discuss ill-posed linear algebraic equations as well as linear opti- 
mization problems while in Sect. 3, we deal with a nonlinear equation (polynomial) 
in one variable having root-clusters/multiple roots. In both the sections we consider 
typical numerical examples which need special care in thorny regions so that uniform 
accuracy or a best possible accuracy is obtained. 

In fact, if the accuracy of solution in some regions, under a fixed precision of 
computation, is high and in other regions, it is low, then such a non-uniformity of 
accuracy of the solution over the whole domain is undesirable. Further, such a non- 
uniform solution is not desirable to be used as input for further computation since 
the resultant accuracy will be the lowest accuracy that is attainable/attained. 

We discuss, in Sect. 4, singular nonlinear boundary value problems associated 
with an ordinary differential equation with application in a real world problem. We 
include conclusions in Sect. 5. 


2 Near-Consistent Linear Systems: Nclinsolver Versus 
Matlab Solvers 


A near-consistent linear system solver named as nclinsolver is compared with widely 
used Matlab solvers for a linear system Ax = b. The scope of these solvers is also 
discussed. It is demonstrated that for near-singular/ill-conditioned singular consistent 
systems, the nclinsolver is better than or comparable to widely used Matlab solvers 
involving inv, pinv and A\b so far as the quality of the solution x is concerned. 

In other words, the numerical computation, where the solution domain is highly 
sensitive is the one that provides us reasonably good accuracy. The order of accuracy 
in the most sensitive region of the domain will be considered the order of accuracy of 
the domain. When this order of accuracy is the same as that in the least sensitive region 
of the domain, then we have achieved the desired uniform accuracy throughout. 

Consider a system of linear equations Ax = b, where A is anm x n numerically 
known matrix and b is an m x | numerically known vector, From the number of 
solutions point of view, Ax = b will have (i) no solution (the system is inconsistent), 
(ii) only one non-trivial (trivial) solution (the matrix A is nonsingular and hence 
square while b is non-null (null)), (iii) infinity of solutions (the system is consistent 
and the matrix A is singular square or non-square rectangular). 

Although case (i) does not have any solution that satisfies the equations Ax = b, 
we will solve a nearest consistent system which is not the (ever-consistent) normal 
equation A‘Ax = A‘b. In addition, we will also focus on cases (ii) and (iii). 

There are many real world problems whose mathematical models give rise to (a) 
ill-conditioned linear systems (i.e. the coefficient matrix A is near-singular implying 
two or more rows of A are near-linearly dependent), (b) singular linear systems (A is 
singular with all its linearly independent rows are sufficiently linearly independent), 
or (c) ill-conditioned singular linear systems (A is singular with two or more of its 
strictly linearly independent rows are near-linearly dependent) [7-11]. 
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Deterministic methods have been extensively used over past decades and are being 
extensively used at present for both large and not so large systems. There are many 
linear systems arising out of real-world problems that occur in most scientific and 
engineering disciplines in a regular way [7-14]. Most of these linear systems are 
small ones but they could be too ill-conditioned to be solved with desired accuracy 
depending on the precision of the computer. 

Thus, there is a significant scope of nclinsolver [10, 11] besides those available in 
Matlab for our daily/frequent use. The nclinsolver | specifically has a broader scope 
in some sense and produce a better quality solution to an ill-conditioned singular 
system than that output by a Matlab command such as pinv, inv (when applicable), 
and A\b. 

Although the nclinsolver 1 is not designed for the minimum-norm least- 
squares (mt-) solution, it does provide the sufficiently accurate mt-solution for 
a singular consistent system within the precision used. 

Several physically concise algorithms have been recently developed [7-9]. All 
these algorithms are deterministic. However, some are iterative while others are 
non-iterative. 

We present clinsolver for consistent singular/near-singular systems followed 
by a general version of this solver named as nclinsolver 1 for near-consistent 
singular/near-singular systems. This general version produces a solution of consis- 
tent system closest to a near-consistent one in some sense. Also, stated in this section, 
are the information provided by Matlab commands pinv(A) x b, inv(A) * b, and 
A\b for the users for computing the solution vector x of the system Ax = b. 

We provide test examples to demonstrate that the order of the quality (order of 
accuracy) of the solution produced by nclinsolver | is better than or comparable to 
those produced by the foregoing three Matlab commands for linear systems with two 
or more rows near-linearly dependent. If the rows of a singular system are exactly 
linearly dependent without any near-linearly dependent rows, then the singular 
system is well-conditioned with respect to a solution. No numerical instability 
resulting in more error occurs. 

NC-LINSOLVER for Ax = b We first present C-LINSOLVER (meant for 
consistent system) and then its more general version NC-LINSOLVER (meant for 
near-consistent systems) for the linear system Ax = b. 

A mathematical model (e.g., a system of linear equations) arising out of 
a real world problem or, equivalently a mathematical problem derived out of 
nature/physical world can never be inconsistent. This is because inconsistency is 
completely unknown in nature. 

If there is any inconsistency in a mathematical model, then it could be due to 
measurement error/limitation of a measuring device or due to human error/mistake. 

If there is a significant inconsistency or, equivalently significant contradiction 
among the equations of the model, then it is necessary to fall back to the model 
to find out the reason of unacceptable inconsistency (contradiction). Although one 
can always find a solution, e.g., a least-squares solution or the minimum-norm least- 
squares solution accurately, such a solution is not likely to be useful. It may prompt 
one to draw wrong inferences in a real-world or engineering environment. 
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If, however, the inconsistency is acceptably small, then a traditional least-squares 
(or the minimum-norm least-squares) solution is good enough. Alternatively, a non- 
traditional solution of a nearest (in some sense) consistent system could be of use in 
practice. 

We include in this section both the mathematical version as well as the computa- 
tional version of NC-LINSOLVER. along with the Matlab codes. The proof follows 
from constructing the orthogonal basis. 


(a) Mathematical C-LINSOLVER The O(mn)* physically concise algorithm 
MATHEMATICAL C-LINSOLVER [10, 11] for consistent linear system Ax = 
bis as follows. Let a} be the ith row of A. Then a; is the column vector, i.e., the 
ith row (of A) written as the column. In-built in the algorithm is (i) the solution 
vector obtained without explicitly computing the minimum-norm least-squares 
inverse A* and (ii) the rank r of A. 


(* MATHEMATICAL C-LINSOLVER *) 


P=I1;x =0;r= 0; for i: = 1 to m do begin EQN (ai, bi, P, x); r: =r+c end 
procedure EQN(a, b, P,x); (* solves one equation atx = b *) 

begin c: = 0; u: = Pa; v: = llull’; s: = b — atx; 

if v £0 then begin P: = P - uu'/v; x: = x + su/v; c: = | end 

else if s 4 0 then Ax = b is inconsistent (contradictory) and terminate 

end; 


Computational C-LINSOLVER Since the numerical zero in a floating-point arith- 
metic is not the same as the mathematical zero [10-12], replace, in MATHEMAT- 
ICAL C-LINSOLVER, 


v#0ands40by v>05x 1044, s>05x 10d 


respectively to obtain COMPUTATIONAL C-LINSOLVER for four significant digit 
accuracy, where 


m m 


G=| °° |aiy| | (mn), 6 = Yo Ibil/m, 


i=1 j=l i=1 


(b) The mathematical NC-LINSOLVER It is the concise linear system solver for 
near-consistent system. If the system is too inconsistent, then it is necessary 
to reexamine the physical model and the derived mathematical model to find 
out the real cause for excessive inconsistency (contradiction) and rectify the 
errors/mistakes. 


Itis not advisable to proceed solving (in the least squares sense) such highly incon- 
sistent systems since the resulting (least-squares or minimum-norm least-squares) 
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solution may convey a wrong message to the physical world. We present a mathe- 
matical version of the NC-LINSOLVER as well as its computational version with a 
Matlab program for ready use (just by copying and pasting). 

It is a modified version of C-LINSOLVER, which provides a solution of the 
consistent system closest (in the sense of the minimum variation/modification of the 
right-hand side vector b componentwise as needed by the modified version) to the 
given near-consistent/inconsistent system Ax = b along with a relative error-bound 
of the solution for the inconsistent system. 

On completion of the execution of NC-LINSOLVER, we obtain a solution x for 
the consistent system Ax = b+ AB, the projection operator (matrix) P = 1 — ATA 
that provides a solution Pz (where z is an arbitrary (null or non-null) column vector) 
of the homogeneous linear system Ax = 0. 

Further, we obtain the rank r of A and Ab of the modification of b. Ab 4 0, 
or, equivalently, Ab; 4 0 for some i tells us that the given Ax = b is inconsis- 
tent. The inconsistency index inci = || Ab||/|| A, b|| as well as a relative error err = 
|b — Ax||/||x|| of the solution x are also produced. ||A, b|| denotes a norm (e.g., the 
Euclidean norm) of the augmented matrix (A, b). 


(* MATHEMATICAL NC-LINSOLVER*) 


P :=J; x :=0; Ab =0; r:=0; 
(* abar = aa, | 4 (mn). boar = b = >|, |/m *) 


is i=l 
(* abar, bbar are exactly the same as mentioned and not needed in this version.*) 


for 1:= 1 to m do 


begin 
u := Pai; v :=|lul|’; s:= bi — aix; c:=0; if v=0 and s#0 then 
begin print 'Ax=b is strictly inconsistent’; Ab;:=—s; bj:=b;+Ab;; s:=0; end else 
if v=0 and s=0 then Abj:=0; 
if v#0 then 
begin 
x :=x+us/v; P:=P-—uul/y; c:= 1; Abj :=0 end; r=rt+c 


end; 
inci:= ||Ab|\/||A,b||;_ err:= ||b — Ax||/||x||; 
print A, b, Ab, x, P, r, inci, err; 


Computational NC-LINSOLVER The following modifications in the foregoing 
mathematical NC-LINSOLVER give us the computational version. 


e Remove (* and *) from the very first comment so that abar and bbar are computed. 
e Replace v = 0 by v < 0.5 x 10~‘abar, v 4 0 by v > 0.5 x 10~*abar, s = 0 by Is! 
< 0.5 x 10~*bbar, and s 4 0 by Isl > 0.5 x 10~*bbar 
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Computational NC-LINSOLVER (Matlab program) The following program 


viz.nclnsolver1is self- explanatory. 


function| |=nclinsolver1(A, b); [m, n]=size(A); 
%NC-LINSOLVER: Near-consistent Linear System Solver 
'A and b of the system Ax=b are’, A, b, 

P=eye(n); sd=0; x(1:n)=0; x=x'; 

delb(1:m)=0; delb=delb'; bo=b; r=0; 

abar=0; for i=1:m, for j=1:n, abar=abar+abs(A(i,j)); 
end; end; 

abar=abar/(m*n); 

bbar=0; for i=1:m, bbar=bbar+abs(b(i)); end; 
bbar=bbar/m; 

for i=1:m 


u=P* A(i,:)';_ v=norm(u)*2; s=b(i) -A(i,:)*x; c=0; 
if v<=.00005*abar & abs(s)>=.00005*bbar, 
delb(i)=-s; sd=-s; b(i)=b(i)+delb(i); s=0; 
elseif v<=.00005*abar & abs(s)<=.00005*bbar; 
delb(i)=0; end; 
if v>=.00005*abar, x=x+u*s/v; P=P-u*u'/v; c=1; 
delb(i)=0; end; 
r=rt+c; 
end; 
if abs(sd)>.00005*(abar+bbar)*0.5, 
'The system Ax=b is inconsistent.', 
end; 
inci=norm(delb)/norm([A,b]); 
err=norm(bo-A*x)/norm(x); 
'The projection operator P = (I - A*+A) is', P, 
'The rank of the matrix A is',r, 
'The inconsistency index is', inci, 
"Modification in vector b, i.e., Db is', delb, 
"Vector b of the nearest consistent system is', b, 
‘Solution vector of the nearest consistent system is', x, 
‘Error in the solution vector x is', err 


YIssuing the Matlab command 
%>>clear all; close all; A=[1 2 3 4;5 6 7 7]; b=[10 25]'; nclinsolver1(A,b) 


“we obtain the result 


Problem | Executing the foregoing program with the data specified i.e. issuing the 


commands 


>-clear all; close all; A = [1 2 3 4;5 6 7 7]; b = [10 25]’; nclinsolver1(A,b) 


we obtain the following result. The matrix A and vector b of the system Ax = b as 


well as the projection operator P = J — At A are 
The matrix A and vector b of the system Ax = b are, 
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1234 10 
wis Bee ics hel 
The projection operator P = (I - A* + A) is (retaining 4 digits) 


0.3986 —0.3913 —0.1812 0.2319 
_ —0.3913 0.6812 —0.2464 —0.0580 
~ —0.1812 —0.2464 0.6884 —0.3478 
0.2319 —0.0580 —0.3478 0.2319 


The rank of the matrix A is r = 2. 

The inconsistency index is inci = 0. 

Modification in vector b, i.e., Db is delb = [0 0]', where the superscript ‘t’ denotes 
the transpose throughout the chapter. 

Vector b of the nearest consistent system is b = [10 25]'. 

Solution vector of the nearest consistent system is 

x = [0.9420 1.0145 1.0870 0.9420]', 

Error in the solution vector x is err = 1.9896e-015. 


Problem 2 Executing the foregoing nclinsolverl program with the data specified as 
in the following Matlab commands 


> clear all; close all; A = [1 2 3 4;5 6 7 8]; b = [10 26]’; nclinsolver1(A,b) 


we obtain the following result. The matrix A and vector b of the system Ax = b as 
well as the projection operator P = J — At A are 


3 -4-1 2 

es 1234 Gis 10 pa SAT S22 = 1 
5678 26 Ses 2 oh 4 

2 = 1.4.3 


The rank of the matrix A, the inconsistency index of the system Ax = b, and the 
modification in vector b viz. delb are exactly the same as those in Problem 1. Hence 
the vector b of the nearest consistent system is b =[10 26]' (unchanged). 

Solution vector of the nearest consistent system is 
x= [ 1.0000 1.0000 1.0000 1.0000]. 

Relative error in the solution vector x is err = 0. 

It can be seen that the solution vector of each of the foregoing 2 problems is the 
minimum norm solution vector. Since both the systems are consistent, the respective 
solution vector satisfies the equation Ax = b numerically and these vectors are also 
the minimum norm least squares solution vectors. 

Observe that x = [1.0000 1.0000 1.0000 1.0000]' is definitely a solution of 
the very Ist problem (Problem 1), it is not its minimum norm solution. In many 
real-world problems such as Production allocation, Diet, Blending, Transportation, 
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Aircraft assignment, and Work-hour scheduling problems, the minimum norm least 
squares inverse and hence the solution (out of infinity of possible solutions) are 
required and are used instead of just any one of the infinity of generalized inverses 
and any one of the infinity of solutions [12, 15, 16]. 


Problem 3 If we now issue the Matlab commands 


>-clear all; A = [1 2 3 4;5 67 8;9 10 11 12; 13 14 15 16];b = [10 26 42 57]’; 
nclinsolver1(A, b) 


entering these in the Matlab command window in 1 line, then we get the following 
solution. The matrix A and vector b of the system Ax = b as well as the projection 
operator P = I — A*A are 


123 4 10 3-4-1 2 
5 67 8 26 =4. 7 = 21 
— 9 1011 12 an 42 |’ ve =—1-2 7 =—4 
13 14 15 16 57 2 —1-4 3 


The system Ax = b is inconsistent. 


The projection operator P = I — At A is the same as the P in Problem 2. 
The rank of the matrix A is r = 2. 

The inconsistency index is inci = 0.0116. 

Modification in vector b, i.e., Db is delb = [0 001 ie 


Vector b of the nearest consistent system is b = [ 10 26 42 58 ]. 
Solution vector of the nearest consistent system is 


x= | 1.0000 1.0000 1.0000 1.0000 ]’ (the minimum norm solution). 
Relative error in the solution vector x is err = 0.5000. 


The minimum norm least squares solution of the foregoing inconsistent system 
is x = [0.8650 0.9425 1.0200 1.0975]! obtained using the Matlab command > x= 
pinv(A)*b. Since the system is inconsistent, the solution will not satisfy the equation. 
However, IIx!l (Euclidean norm, say) is minimum implying a minimum norm solution 
x and |IAx-bll is also a minimum implying a least squares solution x. 

These 2 minimum norm solutions put together constitute the minimum norm 
least squares solution (unique). The command >>x = pinv(A)*b produces always 
the minimum norm least squares solution which exists for any matrix A and any 
vector b as long as their dimensions are compatible. 

To obtain a least squares solution of an inconsistent system using the NC- 
LINSOLVER just allow A := A‘A, b := A'D, the inversion-free algorithm provides 
us the least-squares solution since the normal equation A'Ax = A‘b is always 
consistent irrespective of whether Ax = b is consistent or not. 

The following Matlab commands (which are self-explanatory) 
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> clear all; A = [1 23 4;5 67 8;9 10 11 12; 13 14 15 16];b = [10 26 42 57)’; 


B 


= A’*A, c = A’*b, nclinsolver1(B, c) %Entered in | line in Matlab command 


window 


produce on execution the following output 
The projection operator P is 


0.3000 —0.4000 —0.1000 0.2000 
_ —0.4000 0.7000 —0.2000 —0.1000 
~ —0.1000 —0.2000 0.7000 —0.4000 
0.2000 —0.1000 —0.4000 0.3000 


The rank of the matrix B is r = 2. 

The inconsistency index is inci = 0 

Modification in vector c = [000 O]'. 

Vector c of the nearest consistent system is c = [1259 1394 1529 1664]'. 
Solution vector of the nearest consistent system is 

x = [0.8650 0.9425 1.0200 1.0975]'. 

Error in the solution vector x is err = 1.6321e-013. 


However, too inconsistent (too contradictory) system needs to be reexamined for 
possible mistakes and corrected before using the solver. We will certainly get correct 
least-squares solution of the highly inconsistent system by the solver, but such a 
solution may not be useful for a real world problem. 

High inconsistency implies huge contradiction among specified information 
of the problem, whose mathematical model is the linear system. This inconsistency 
could happen due to human mistake or error in the concerned measuring device. 
It must be noted that any physical problem derived out of nature (real world) will 
be totally contradiction-free if the derived physical problem exactly represents the 
natural problem (possibly without any assumption since assumptions do inject 
error). 

Matlab commands (information for users provided by Matlab) 


(i) 


(ii) 


pinv The Moore-Penrose inverse (also called the pseudoinverse) X = 
pinv(A) produces the unique matrix X of the same dimension as A’ so that 
AXA = A, XAX = X, (AX) = AX, (XA)! = XA. If A is complex then 
A*X and X*A will be Hermitian. The computation is based on SV D(A) 
and any singular values less than a tolerance are treated as zero. The default 
tolerance is max(size(A)) * norm(A) * eps(class(A)). pinv(A, tol) uses the 
tolerance fol instead of the default. 

inv Matrix inverse. X = inv(A) is the inverse of the square non-singular 
matrix A. A warning message is printed if A is badly scaled or near-singular. 
In practice, it is seldom necessary to form the explicit inverse of a matrix. A 
frequent misuse of inv arises when solving the system of linear equations. 


(iii) x = A\b Matrix Division Operator One way to solve the linear system Ax = b 


is with x = inv(A)*b. A better way, from both an execution time and numerical 
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accuracy standpoint, is to use the matrix division operator \: x = A\b. This 
produces the solution using Gauss elimination, without forming the inverse. 


If Ais anm xn matrix where m 4 n and Bis acolumn vector with m components, 
or a matrix with several such columns, then X = A\B is the solution in the least 
squares sense to the under- or over-determined system of equations AX = B. The 
effective rank r of A is computed from the QR decomposition with pivoting. A 
solution X is computed that has at most r nonzero components per column. Ifr <n, 
then this is usually not the same solution as pinv(A)* B which is the least. 


Example (from Matlab) that includes suggestion from Matlab The following example 
demonstrates the difference between solving a linear system by inverting the matrix 
with inv(A)*b and solving it directly with A\b. A random matrix A of order 600 is 
constructed so that its condition number cond(A) = 10!° and its norm norm(A) is 1. 

Condition number with respect to matrix inversion cond(X) returns the 2-norm 
condition number (viz. the ratio of the largest singular value of X to its smallest). A 
large condition number indicates that the matrix X is near singular. 

The exact solution x is a random vector of length 600 and the right-hand side 
vector is b = A*x. Thus the system of linear equations is badly conditioned, but 
consistent. 

It takes 2 to 3 times as long to compute the solution with y = inv(A)*b as with 
z= A\b. 

The following Matlab commands (self-explanatory) 
> clear all; A = rand(600); A = A/norm(A); b = sum(A’); b = b’; t = cputime; 
x = inv(A)*b; t = cputime-t 


entered in | line in the Matlab command window produce t = 0.1716 s. Similarly, 
we have 


> clear all; A = rand(600); A = A/norm(A); b = sum(A’); b = b’; t = cputime; 
x = A\b; t = cputime-t 


t = 0.0624 s, 


> clear all; A = rand(600); A = A/norm(A); b = sum(A’); b = b’; t = cputime; 
x = inv(A)*b; t = cputime-t 


t= 0.1092 s, 


> clear all; A = rand(600); A = A/norm(A); b = sum(A’); b = b’; t = cputime; 
x = inv(A)*b; t = cputime-t 


t = 0.1248 s, and 


> clear all; A = rand(600); A = A/norm(A); b = sum(A’); b = b’; t = cputime; 
x = A\b; t = cputime-t 


t = 0.0468 s 
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Both produce computed solutions with about the same error 10~° reflecting 
the condition number of the matrix. Observe that at different points in time, the 
rand command produces different random matrices unlike any deterministic 
computations. 

For instance, >>A = rand(600); c = cond(A) produces condition numbers 


c = 1.4783e + 004, c = 3.9374e + 004, c = 2.0677e + 004, 
c = 3.9374e + 004, c = 2.0677e + 004 


in successive runs. Although these condition numbers are not often the same because 
of the fact that the random matrix produced goes on changing from one run to another, 
their order is the same viz. 10*. 

Also, the foregoing differences in time complexity are due to the fact that the 
machine executes different sets of internal instructions at different times. 

The direct solution produces residuals on the order of the machine accuracy, even 
though the system is not so well conditioned. The behavior of this example is typical. 
Using A\b instead of inv(A) * b is around 2 times as fast and outputs residuals on 
the order of machine accuracy, relative to the magnitude of the data. 


2.1 Numerical Experiments 


We have carried out numerical experiments on typical problems including both 
well-conditioned singular systems and ill-conditioned singular systems including 
near-singular ones using the following Matlab program nclinsolver linvpinvinvfree. 
The program obtains the relative error in each of four solution procedures, viz., 
nclinsolver1, inv, pinv, and invfree (A\b) methods. 


% Computational NC-LINSOLVER (Matlab program) The following program 
“is self-explanatory. 

function[ |=nclinsolverlinvpinvinvfree(k,A,b); format long g; 

[m, n]=size(A); 

e=ones(n, 1);en=norm(e); 

%NC-LINSOLVER: Near-consistent Linear System Solver 

%'The matrix A and vector b of the system Ax=b are', A,b, 

P=eye(n); sd=0; x(1:n)=0; x=x'; delb(1:m)=0; delb=delb'; bo=b; r=0; 
abar=0;for i=1:m, for j=1:n,abar=abar+abs(A(i,j)); end;end;abar=abar/(m*n); 
bbar=0; for i=1:m, bbar=bbar+abs(b(i)); end; bbar=bbar/m; 
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for i=l:m 
u=P*A(i,:)';, v=norm(u)*2; s=b(i) -A(i,:)*x; c=0; 
if v<=.5*104-14*abar & abs(s)>=.5* 10*-14* bbar,delb(i)=-s;sd=-s;b(i)=b(i)+delb(i); 
s=0; 
elseif v<=.5*104-14*abar & abs(s)<=.5*10“-14*bbar; delb(i)=0; end; 
if v>=.5*104-14*abar, x=x+u*s/v; P=P-u*u'/v; c=1; delb(i)=0; end; r=r+e; 
end; 
if abs(sd)>.5*104-14*(abar+bbar)*0.5,'The system Ax=b is inconsistent.',end; 
inci=norm(delb)/norm([A,b]); 
“%err=norm(bo-A*x)/norm(x); 
%'The projection operator P = (I - A+A) is’, P, 
'The rank of the matrix A is’, r, 
%'The inconsistency index is’, inci, 
% ‘Modification in vector b, i.e., Db is', delb, 
%'Vector b of the nearest consistent system is', b, 
%'Solution vector of the nearest consistent system is', x, 
% ‘Error in the solution vector x is', err 
errlinsolver=norm(e-x)/en; x=inv(A)*b;errinv=norm(e-x)/en; 
x=pinv(A)*b; errpinv=norm(e-x)/en; errinvfree=norm(e-x)/en; 
display ('k errlinsolver erriny errpinv errinvfree'); 
[k errlinsolver errinv errpinv errinvfree] 


(i) Consider the progressively near-singular system 


Ax =b, (1) 
1+0.5x10*2 3 4 10+0.5 x 10-* 
5 6 7 8 26 
eae C 101112 |? = | a2 
13 14 15 15 57 
For k = 1(1)13, we execute the Matlab function subprogram nclin- 


solverlinvpinvinvfree by issuing the Matlab commands, observing that the right-hand 
side vector b is the column vector consisting of row-sums of the matrix A, 


> clear all; close all; format long g; for k = 1:13, A = [1 + 0.5*10°-k 23.435 67 
8;9 10 11 12; 13 14 15 15]; b = (sum(A’))’; b, nclinsolverlinvpinvinvfree(k,A,b); 
end; 


in one line in Matlab command window, we get the result. Observe that the right- 
hand side vector b has been chosen as the row sums of the matrix A. For instance, 
b = [10.05 26 42 57] whenk = 1. 


The true (exact) solution is x = [ 1111 ] for each of the foregoing 13 systems. 
Table | depicts the relative error in the solution vector produced by each of the four 
methods, viz., (i) nclinsolver1, (ii) Matlab inv, (411) Matlab pinv, and (iv) Matlab 
inv-free (Gauss elimination A\b) methods all embedded in the function subprogram 
nclinsolverlinvpinvinvfree. The precision of computation used is 15 digits. It can 
be seen that as k (finite positive integer) increases, the singularity becomes more 
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pronounced although the system remains mathematically strictly non-singular. It 
may be seen that in mathematics as well as in nature, the precision of computation 
used is infinity of digits, which cannot be captured and will never be captured by any 
digital computer of today as well as of tomorrow. 

From Table 1, it can be seen that for k = 13 the system Ax = b has been treated 
as singular by the Matlab pinv-based method. Consequently the method produced 
the minimum norm (least-squares) solutions which in our example coincides almost 
perfectly with the exact solution, viz., x =e = [ 1111 I. Such a coincidence may 
not always result in a general way, but whatever solution, termed as the minimum- 
norm least-squares solution, that we get will be significantly accurate and will be 
sufficiently good for real world applications. Observe that fork = 11, 12, the Matlab 
inv, pinv, and inv-free (A\b) produce considerable error in the solution. 

This is mainly because the Matlab inv, pinv, and inv-free has treated the 
system very ill-conditioned (with respect to the solution) not amounting to numeri- 
cally/computationally exactly singular. 

If we compare between the Matlab pinv against the Matlab inv, and inv-free 
methods, then the Matlab inv is more desirable to be used when the singularity 
is not very pronounced as it will produce more accurate solution than that produced 
by the Matlab pinv- as well as invfree-based methods (see e.g. k = 1, 2, 3, 4). 

Even the Matlab inversion-free method which is the Gauss elimination method 
will be more or less equally good as the Matlab pinv based solution when the system 
is not too singular. See, for example, Table 1 where both pinv- and inv-free methods 
provide numerically identical relative error for all k viz. k = 1,2,3,--- , 13. 

However, it would not be incorrect to say that the order of the relative error in 
nclinsolver is less or comparable to those in other 3 Matlab methods viz. inv-, pinv-, 
and inv-free (A\b)-methods. Except the output results, there is no way to compare the 
3 Matlab routines against nclinsolver more deeply as the software code implemented 
in Matlab is beyond the user’s domain. 

Since the standard precision used in Matlab is 15 digits, it is not much mean- 
ingful to go beyond k = 13 since the (1, 1)st element of A viz. 10 + 0.5 x 
10—!3 = 10.00000000000005 is approximated as 10.0000000000001 which possibly 
could/would produce/show numerical error not acceptable in this scenario of sensitive 
comparative study. 

Even for k = 12, the corresponding element of A which should be exactly 
10.0000000000005 is approximated as 10.000000000001 which could result is unac- 
ceptable error. However, for k = 1, 2,3,--- , 11, the (1, 1)st element of A is exactly 
computed and output by the computer. 


(ii) We now consider an ill-conditioned singular system Ax = b, where two rows 
are near-linearly dependent and the coefficient matrix A is singular. Let 
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12 3 4 10 
au|2 4 6 8 _= 20 
9 1011 12 42 
13 1415 164+.5 x 10-* 58+ .5x 10-* 


For k = 1(1)12, Matlab will treat the matrix singular having strictly three linearly 
independent rows, 1.e., rank 3 or, in other words, having three near linearly dependent 
rows. This implies that the system Ax = b is ill-conditioned singular. 

For k = 13(1)14, Matlab will treat A as singular with only two rows linearly 
independent, i.e., with rank 2 since the standard precision used in Matlab is | Sdigits. 
The matrix in the later case will be treated as a well-conditioned singular system 
same as the one where the term “0.5 x 10~*” is completely absent. 

Issuing the following Matlab commands, where the right-hand side vector b is the 
column vector having row sums of the matrix A as its elements, 


> clear all; close all; format long g; for k = 1:14, A = [1 2 3 4;2 4 6 8;9 10 11 
12; 13 14 15 16 + 0.5*10°-k]; b = (sum(A’))’; b, nclinsolverlinvpinvinvfree(k, 
A, b); end; 


we obtain the relative error in the solution vector x of the system Ax = Db as in 
Table 2. 

Since the matrix A is singular, the true inverse inv does not exist. The relative error 
in Matlab pinv(pinv(A)) and that in Matlab matrix inversion-free (invfree) method 
(A\b) have been found to be the same. However, the relative error in nclinsolver 1 
and that in pinv/ invfree method are different. The accuracy of the solution vector 


Table 2 Relative error in the solution vector x (Since A is singular inv is non-existent) 


k errnclinsolver errpinv errinvfree 

1 1.29584965577794e-013 2.54211497292521e-013 2.54211497292521e-013 
2 1.29712573544371e-012 3.19134161318702e-012 3.19134161318702e-012 
3 1.92296268638356e-016 1.78223835448675e-011 1.78223835448675e-011 
4 1.92296268638356e-016 2.66741012162287e-010 2.66741012162287e-010 
5 1.92296268638356e-016 3.1237508787305e-009 3.1237508787305e-009 
6 1.92296268638356e-016 2.26600562419676e-008 2.26600562419676e-008 
7 1.92296268638356e-016 4.05355409048064e-007 4.05355409048064e-007 
8 1.92296268638356e-016 1.06624029994001e-006 1.06624029994001e-006 
9 1.92296268638356e-016 4.17879148487218e-005 4.17879148487218e-005 
10 1.92296268638356e-016 0.000305 17578125 0.00030517578125 

11 1.92296268638356e-016 0.0019683246455807 0.0019683246455807 

12 1.92296268638356e-016 0.01171875 0.01171875 

13 1.92296268638356e-016 1.13220977340074e-015 1.13220977340074e-015 
14 1.92296268638356e-016 7.38527833091255e-016 7.38527833091255e-016 
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produced by nelinsolverlinvpinvinvfree is consistently better than that produced by 
pinv or invfree method. For 1(1)12, the rank r is produced as 3 by Matlab while for 
k = 13, 14, itis 2. 


(iii) We now consider yet another ill-conditioned near-singular (strictly non- 


singular) system Ax = b, where 2rows are near-linearly dependent. Let. 
147 12 

A=|258 , b=} 15 
369+.5x 10% 18+ .5 x 10-* 


Issuing the following Matlab commands, where b is the column vector having 
row-sums of A as its elements, 


> clear all; close all; format long g; for k = 1:14, A = [1 4 7;2 5 83369 + 
0.5*10°-k]; b = (sum(A’))’; b, nclinsolverlinvpinvinvfree(k,A,b); end; 


we obtain the required relative error as depicted in Table 3. It can be seen that as 
k (a finite positive integer) increases, the singularity becomes more pronounced 
although the system remains mathematically strictly non-singular implying that 
the rank of A is 3. Given the precision of computation as 15 digits, the numerical 
rank depicted by Matlab is 3 up tok = 12. Fork = 13(1)14, the numerical rank 
produced by Matlab is 2. 

Once again the relative error in the solution vector produced by pinv as well as 
invfree (A\b) are the same. Since the matrix is strictly non-singular, Matlab inv has 
performed better than (as good as) pinv and invfree up to k = 12, when the precision 
is 15 digits. So Matlab inv is recommended instead of Matlab pinv or Matlab matrix 
division operator A\b for the computation of the solution vector x for k = 1(1)12. 
The accuracy in the solution vector x output by nclinsolver1 is, over the whole 
range (i.e., for k = 1(1)14), is better than or comparable to that produced by the 
Matlab commands inv, pinv, and A\b (invfree). 

The graph of the relative errors in the solution vector computed using 
nclinsolver|. Matlab inv, Matlab pinv(pseudoinverse) and Matlab invfree (A\b) 
is shown in Fig. |. The graph has been produced by the following Matlab program 
(script file) errornclinsolverinypinv. 


function[|=errornclinsolverinvpinv 
k=1:1:13; k=k'; 
e1=[4.3671e-13 1.4071e-12 4.2768e-12 2.2092e-10 9.7881e-14 1.2454e-12 
7.5205e-12 5.0187e-11 1.6945e-13 4.4500e-12 2.2998e-11 2.5950e-10 1.6945e-13]; 
e2=[ 9.7881e-14 1.2454e-12 7.5205e-12 5.0187e-11 2.6015e-9 1.0836e-8 
8.7177e-8 2.5079e-6 1.3551e-5 0.0003 0.0018 0.0151 0.1182]; 
e3=[1.6945e-13 4.4500e-12 2.2998e-11 2.5950e-10 1.7341e-9 4.0223e-8 2.0483e-7 
6.2945e-6 1.3947e-5 5.40880e-6 0.0012 0.0524 8.7023e-015]; 
plot( k,log(e), k, log(e1),'--', k, log(e2), '-.'); xlabel('k'); ylabel(‘natural log of errors"); 
title('Error pattern in nclinsolver1 Matlab inv and Matlab pinv/invfree'); 
legend('nclinsolver1', ‘Matlab inv','Matlab pinv/invfree'); 
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Error pattern in nclinsolver1 Matlab inv and Matlab pinv/inviree 
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Fig. 1 Relative error pattern in the solution vector in the natural logarithmic scale using (i) 
nclinsolver1, (ii) Matlab inv, and (iii) Matlab pinv/invfree 


From Fig. |, as k increases the degree of singularity also increases. For 1 < k < 7, 
nclinsolver\ and Matlab pinv/invfree are having more or less same relative error, 
while the Matlab inv is more accurate than any of the remaining three methods. For 
1(1) 12, the numerical rank of the matrix is depicted by Matlab as 3 (which is also the 
mathematical rank) while for k = 13, 14, it is shown as 2. When the singularity is 
not too pronounced, the use of inv or nclinsolverl|. is recommended for computing 
a solution vector. For 7 < k < 14, nclinsolver1 has significantly less relative error 
than Matlab pinv/invfree. The Matlab inv fluctuates in relative error and mostly less 
accurate than nclinsolverl.. The fluctuation in the error observed in Matlab inv may 
be attributed to rounding error due to division by small numbers in a finite precision 
computational environment. This is because the inverse tends to non-exist when the 
singularity is highly pronounced compared to the precision. When the linear system 
is close to a non-singular system as well as when it is close to an exactly singular 
system with respect to the precision, relative errors are very low or negligible in 
nclinsolver|.. On the other hand, the relative errors are high in the middle, i.e., 
when the system is near-singular with respect to the precision. 
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2.2 Remarks: Matlab Commands, Large Systems, Pruning, 
Rank, and Complexity 


Matlab commands are intrinsically excellent and ideally suited for non-programmers 
The codes used by Matlab for the inv, pinv, as well as inversion-free A\b commands 
have been improved over years to such an extent that these commands produce a best 
result with a best time complexity subject to the inherent limitations of each of 
the commands that are based on specified numerical methods. These commands 
are extensively used by scientists, engineers, and researchers and are designed 
for general usage. Matlab is known to be an excellent user-friendly highest level 
programming language ideally suited for scientific and technological communities 
who do not have to formally know/learn a programming language using significant 
effort and time. 

Nclinsolverl. is shown to be better than or comparable to the Matlab commands 
for solution of consistent systems with near-linearly dependent rows The purpose is 
to compare the algorithm nclinsolverl|. with those of the Matlab solvers (commands) 
specifically for ill-conditioned singular/near-singular systems only in terms of the 
quality (accuracy) of the solution vector. In the comparison, we attempted to establish 
through typical test numerical examples, the possible scope of nclinsolver1|. keeping 
in view the extensively used Matlab commands inv, pinv, as well as inversion-free 
A\b so far as the quality of the solution is concerned for a specified precision. From our 
numerical experiment, we find that nclinsolverl. is better than or at least comparable 
to the Matlab solvers for near-singular/ill-conditioned singular systems. In addition to 
producing the solution vector, nclinsolver1., in the process of its execution, provides 
a number of related useful parameters such as the orthogonal projection operator 
P = I—A‘A, therank, degree of inconsistency, i.e. inconsistency index, and a nearest 
consistent system (for an inconsistent system). However, the Matlab commands are 
definitely superbly coded by the concerned software engineers within the limitation 
of the numerical method(s) employed. The code (program), however, is out of bound 
for users. 

Large sparse linear systems We have not explicitly addressed the large sparse 
systems with coefficient matrix of order 500 or more. However, an appropriate 
sparse technique needs to be superimposed on to nclinsolver|. to derive the best 
possible solution in terms of accuracy and complexity—both storage and computa- 
tional. Nevertheless, nclinsolver|. programmed in Matlab has a significant scope 
for small and mid-sized ill-conditioned singular systems. While we come across 
some large sparse systems, we often come across, in science and engineering, many 
small and mid-sized linear systems. So, nclinsolver|. could be used freely for a 
best-quality solution, specifically for the ill-conditioned ones. Since nclinsolverl. 
is basically numerically stable by virtue of its construction using orthogonal process, 
it is expected to perform very well for large dense systems. However, we have not 
tried a truly large dense system because of executable memory limitation. 

Pruning linearly dependent rows For a consistent system Ax = b, the best option 
will be to identify the linearly dependent rows of the augmented matrix [A, b] and 
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prune them leaving only the linearly independent rows Such a row pruning operation 
is just an integral part of the solver with no significant extra computational effort [9, 
13, 14, 17]. The resulting solution involves less computation, less storage, and hence 
less error. 

Matlab implementation for pruning The following Matlab program which is self- 
explanatory 


(i) computes the orthogonal projection operator P = J — A*A of the matrix 
A, where A* is the Moore-Penrose inverse (equivalently, the minimum-norm 
least-squares inverse) of the rectangular matrix A, 

(ii) produces a solution to the system if the system is consistent, 

(iii) prunes the redundant (linearly dependent) rows of the system if the system is 
consistent, depicts the rank of the matrix A, (iv) displays the pruned system, 
where the pruned matrix is evidently full-row rank and the system is consistent, 
(v) provides a solution to the system, and 

(vi) also produces a nontrivial solution of the system if the system is homogeneous, 
i.e., if Ax = 0 (null column vector). 


function pruningbasedlinsolver(A,b); 

B=A; c=b; rrow=0; 

[m n]=size(A); p=0; 

x=zeros(n,1); r=0; P=eye(n); k=1; 
abar=sum(sum(abs(A)))/(m*n); bbar=sum(abs(b))/m; 

for i= 1:m 

a=A(i,:)'; brow=b(i); u=P*a; v=(norm(u))*2; 

inconsistency = brow-a'*x; 

if abs(v) >= 0.00005*abar “For 4 significant digit accuracy 
P=P-u*u'/v; x = x + inconsistency*u/v; r=r+1; 

else 

if abs(inconsistency) > 0.00005*bbar 

disp('Linear system Ax=b is inconsistent.'); p=1; 

break; 

else 

% Store indices of redundant rows in a vector. 
redrow(k)=i; k=k+1; rrow=1; 

end 

end 

end 

disp('The orthogonal projection operator P=I-A‘*+A is'); P 
“olf the system is homogeneous, i.e. if Ax=0 (always consistent) 
% a solution is x=Pz, where z is an arbitrary column vector. 
if bbar==' 

z=rand(n,1);disp(‘'A nontrivial solution to Ax=0 is');P*z 
end 

% Prune the redundant rows of the augmented matrix (A, b). 
if rrow== 

c(redrow)=[]; B(redrow,:)=[]; 

end 

% Display the results. 


ifp~=1 
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S=size(B); 

if Sd)<=m 

disp('A solution to the system is '); disp(x); 
disp('The rank of the matrix A is '); disp(r); 
disp('The pruned system Bx=c has B and c as');B, c 
else 

disp('The solution to the system is '); disp(x); 

end 

end 


We illustrate the foregoing algorithm (Matab program) with the following numer- 
ical example viz. the 5 x 4 consistent matrix equation. The command (to be typed 
in 1 line) in the Matlab command window is 

>A=[1234;2468;369 12;1111; 1-21 -5], b = [10 20 30 4 -5]’, 

pruningbasedlinsolver(A,b) 


The result that we get is as follows. 


1 23 4 
2 46 8 

A=3 69 12 b=[1020304-5]' 
Pa 4 
Pa7055 


The orthogonal projection operator P = J-A* + A is 


0.3000 —0.4000 —0.1000 0.2000 
_ —0.4000 0.5333 0.1333 —0.2667 
~ —0.1000 0.1333 0.0333 —0.0667 
0.2000 —0.2667 —0.0667 0.1333 
A solution to the system is [ 1111 i 
The rank of the matrix A is 3 
The pruned system Bx = c has B and c as 


1 23 4 
B=1 11 1 c=[104~-5] 
L215 


Rank computation The Matlab rank command for the rank computation of a 
matrix is just excellent, better than that produced nclinsolver!. But nclinsolver1 
is not focused on rank computation nor is it oriented for accurate rank computa- 
tion, although the rank would be produced as a by-product in the process of the 
computation without any extra effort. 

Computational complexity Both the Matlab solvers as well as nclinsolver1 have 
the same order of computational complexity, viz., O(n3) or O(mn?), where A in 
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the system Ax = b is ann X n or anm Xn matrix. However, the Matlab solvers 
are so coded/implemented (unknown to users) that the time complexity is noticeably 
minimized. One may use the Matlab command cputime which registers the cpu time 
when executed. At some point in the Matlab program one can use the command t = 
cputime and at some other later (or end of the program) point one can obtain the time 
used up as t = cputime-t. This used up time provides us the time complexity of the 
concerned computation. Observe that the time complexity is directly proportional 
to computational complexity (amount or, equivalently, cost of computation). 


2.3 Linear Optimization Problems with Near-Linearly 
Dependent Constraints 


A linear optimization problem, or, equivalently, a linear program (Ip) is defined 
as Maxc'x subjectto Ax = b,x > O(mullcolumn vector), where c = 
Le C29 Cy ii is anumerically specified column vector, A = [ai | isagivenm xn 


numerical matrix, and b = [b 1 ba +++ Dy li is anumerically specified column vector. 
This problem has all the pitfalls of the foregoing ill-conditioned linear systems. The 
equality constraint Ax = b will have (i) no solution, or (ii) only one solution, or (iii) 
infinity of solutions. If Ax = b has no solution, then the concerned Ip has no solution. 
If Ax = b has only one solution then the concerned Ip has only this one solution if 
this solution is nonnegative, else the Ip has no solution. If Ax = b has infinity of 
solutions, then the concerned Ip will choose one of these solutions using the objec- 
tive function Max c'x provided there exists a non-negative solution. This last case 
is almost always encountered in an Ip, while the former two cases are practically 
non-existent in an Ip. 

The simplex algorithm—an exponential-time algorithm in the worst case but 
behaves like a polynomial-time algorithm in real world problems encountered so 
far—is essentially Gaussian type of algorithm in terms of row operations. It has 
thus all the pitfalls when the linear constraints are ill-conditioned, e.g., when the 
constraints are near-linearly dependent. 

The simplex algorithm is an exterior point method and starts with a known initial 
feasible solution as is the case with most methods—exterior or interior point— 
for linear optimization problems. An interior point method such as the Karmarkar 
projective transformation method for an LP [15, 16] is an iterative polynomial- 
time method like any method for LP excluding the exponential-time one based on 
the fundamental theorem of linear programming. Karmarkar method is very slowly 
converging and has the same but less pronounced pitfalls than the simplex algorithm 
when the constraints are near linearly dependent. 

Concise Algorithm for Linear Programs: Monotonic Convergence, Basic 
Variables, Boundedness Let the LP be 
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Minimize objective function c'x subject to equality constraints Ax = b,x > 0 
(non-negativity constraints), 


where A = [a;;] isa specified m xn numerical matrix of rank m,c = [c1, c2,-°-+ , Cn]! 
is a given numerical cost vector, b = [by, b2-+- , bm]' is a given numerical response 
vector, and x = [x;, X2,--- , X»]‘ is the solution vector to be computed numerically. 

The equality constraint Ax = b together with the non-negativity condition x > 0 
(null column vector) of the LP may be viewed geometrically as an n-dimensional 
(n — D) polytope (convex n — D solution space). This polytope is a proper Euclidean 
subspace of the n — D [18] solution space defined by Ax = b and exists in the 1* 
hyper-quadrant of dimension n of the solution space. 

If the polytope is empty, then the LP is infeasible implying the existence of 
contradictory equality constraints or violation of a non-negativity constraint. 

If the polytope consists of only one point, then this point will be the optimal 
solution of the LP where the objective function has no role to play. Such a situation 
in an LP is practically non-existent. 

The only remaining possibility is that the polytope, bounded/closed or unbounded, 
has infinity of solutions and not just two or just three or just a finite number of 
solutions. This possibility is almost always encountered in solving an LP. 

If the polytope is bounded, i.e., a closed convex region, then one of the infinite 
points, including the corner points as well as points on the boundary, will minimize 
the objective function c'x. 

However, the corner points of the polytope {each one of which is a solution, 
called a basic feasible solution; these basic feasible solutions are finite and could 
be combinatorially very large} constitute a finite (not infinite) number of potential 
solution vectors. One or more of these finite number of corner points will mini- 
mize the objective function c’x according to the fundamental theorem of linear 
programming. 

If the polytope is unbounded then there exists no finite minimum value of the 
objective function. 

It may be seen that the foregoing LP is exactly the same as the LP 


Maximize — c'x subjectto Ax = b,x > 0 


(c, A, b, and x are as defined in the LP) so far as the solution vector x is concerned. 

The LP reduces to just solving a linear system mathematically non-iteratively 
once we know the basic variables that minimize the objective function ¢‘. The 
foregoing linear system can also be solved mathematically iteratively, but we restrict 
ourselves to the non-iterative solution. 

Allowing the non-basic variables zero and deleting the column vectors of the coef- 
ficient matrix A corresponding to the non-basic variables we obtain a square nonsin- 
gular system Bx = b,x => 0.Solving this system will readily produce the optimal 
basic feasible solution. 

The simplex method, an exterior method, computes only an optimal basic 
feasible solution. It does not compute an optimal non-basic feasible solution as is 
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often produced by an interior method like the Karmarkar method and its variants 
[15, 19-22]. 

In the n-D Cartesian coordinate system, we have 2nhyper-quadrants. The | ‘‘hyper- 
quadrant has points, each one of which is viewed as an n-D vector with all elements 
non-negative. According to the fundamental theorem of linear programming, 


(i) if there is a feasible solution then there is a basic feasible solution and 
(ii) if there is an optimal feasible solution then there is an optimal basic feasible 
solution. 


A non-basic feasible solution x is defined as one in which the number of positive 
components/elements in x is more than the number of basic variables. 

An interior point method could produce either an optimal basic feasible solution 
or an optimal non-basic feasible solution, i.e., a solution that has more variables (with 
positive values) than the number of basic variables which are positive. In a real world 
situation, both have their scope. While the optimal solution need not be unique, 
the optimal value of the objective function will always be unique. 

Here we present the polynomial-time iterative-cum-non-iterative algorithm for 
the LP along with its Matlab program. The iterative part is based on the Barnes algo- 
rithm [19-21] which is a variation of Karmarkar projective transformation algorithm 
[15, 23]. 

The non-iterative part is simply the Matlab command x = B\b or, alternatively, a 
physically concise algorithm [8—10] suitable not only for square linear systems but 
also for non-square rectangular linear systems. One can think of 3 ways to solve the 
LP. 


(i) One way is to use only the iterative part (due to Barnes) for a sufficient number 
of iterations till we obtain the optimal solution of the LP to a desired accuracy. 
There is no 2nd part, viz., solving linear system Bx = b non-iteratively, nor 
there is any optimality test. 

(ii) The 2nd way is to take advantage of the monotonic convergence of the solution 
vector x in Barnes iterations. The iterations starting from the known initial 
feasible solution (which is always the case for solving any LP) could be such 
that one or more elements of the solution vector could initially (during first 
few iterations) increase and then decrease or vice versa before monotonically 
converging (increasing continuously or decreasing continuously) toward the 
required solution of the LP. 


After a sufficient number of iterations we will discover that there are elements of 
x, whose values have gone sufficiently close to 0. These elements of x will be the 
possible non-basic variables while the remaining elements of x will be the possible 
basic variables. 

After the detection of the basic variables, we discontinue the iteration, shrink the 
linear system Ax = b to the system Bx = b which will contain only the basic variables, 
and solve the shrunk linear system (consisting of only basic variables while values of 
other variables are made zero) mathematically non-iteratively to obtain the required 
optimal solution. 
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To ensure that the detection of the basic variables is correct we perform an opti- 
mality test for the computed solution of the LP. In this procedure we can save 
significant number of iterations. The failure of the optimality test, if any, would 
prompt us to continue the Barnes iterations further along with the optimality test 
after each iteration till the test is successful. It can be seen that the optimality test 
works well if the LP is bounded. 

For unbounded LPs, the optimality test after each iteration may be satisfied 
giving an impression that the optimal solution has been reached. In our program 
we perform one more iteration along with the optimality test to see if the objective 
function value decreases further. If this happens then the LP is unbounded. 

On the other hand, if the iterations are continued for a sufficient number of times 
before the optimality test is performed, then the unboundedness will unfold. 


(iii) The 3rd way is to use Barnes iterations followed by optimality test after each 
iteration starting from the very Ist iteration. There is no solving of linear system 
Bx = b non-iteratively (or iteratively). 


The Matlab code presented here is for the user to readily comprehend the proce- 
dure, modify the code easily if necessary, and use the code just by copying, pasting, 
and then executing for a given numerical LP. The highest level Matlab code is meant 
for scientists, engineers, and others who may not have formal knowledge in computer 
programming and who are essentially interested in the solution of their computational 
problems and not in the intricacies of computer programming. 

Proposed algorithm The proposed algorithm comprising an iterative-cum-non- 
iterative algorithm with error, complexity, and Matlab code. The iterative part is the 
Barnes method [19] while the non-iterative part is a physically concise procedure 
or. the Matlab command x= A\, to solve the linear system Bx = b involving basic 
variables only. 

These are followed by the Matlab code that combines the Barnes method (iterative) 
including the detection of basic variables, and the linear system solver (non-iterative). 
We then include the error and time-cum-storage complexities of the algorithm we 
propose. 

We omit here the mathematical description of Barnes algorithm [17, 19- 
21]. However, the reader would find in our proposed algorithm (described later) the 
Barnes algorithm in the form of Matlab code with requisite comments. This code 
(with comments) is self-explanatory, very high level, easy to follow (user friendly), 
and can be modified as per the user’s special requirement. Like any other formal 
(programming) language, it is interpretation-independent. 

The proposed algorithm is essentially a partial Barnes iterative algorithm along 
with a non-iterative linear system solver and optimality test. The algorithm is as 
follows. 


1. Perform a few iterations of the foregoing Barnes algorithm. 

2. Choose those variables of the vector x* which tend to assume positive values 
and reject those which tend to 0 along with their corresponding i columns in the 
coefficient matrix A and also along with their corresponding coefficients in the 
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cost vector c. Call the resulting coefficient matrix B, the resulting cost vector cg, 
and the resulting (yet to be computed) solution vector xg. 

3. Compute the (basic) solution vector xg of the linear system Bxg = b. Construct 
the complete solution vector x9 of the LP by inserting O for the value of the 
non-basic variables. 

4. Test the optimality of the computed solution xo of the LP Minimize c’x subject 
to Ax = b,x => 0 as follows, assuming that it is bounded. 


Compute y’ = ci, B~! (row vector). 
Compute z; — cj = y’ ( piace i) (scalar), where p; is the jth non-basic vector in 
A. 

e If all z; —cj; < 0, then the solution is optimal [25]. Otherwise the monotonic 
convergence of the solution vector in Barnes algorithm has not yet set in. Continue 
further iterations of Barnes algorithm till monotonic convergence is achieved. 
Redo Steps 2, 3, and 4. 


The precise iteration in which the monotonic convergence sets in is not known 
a priori. Hence the test of optimality is required to be used to determine whether 
the solution is truly optimal or not. The computational complexity is marginally 
increased. This second way of implementation still remains polynomial time. 

Detection of basic variables The monotonic convergence of the algorithms permits 
the detection of both basic and non-basic variables in the foregoing LP. The following 
Table 4 depicts the detection for different types of LPs. 

Error and Complexity The foregoing main algorithm involves finally the solution 
of the linear system Bx = b only. The error associated with the solution of the LP is 
only the error in the linear system. If x° is the computed solution vector of Bx = b, 
then the relative error in the solution of the LP may be given as ||Bx° — b||/||x*°||. 

It can be seen that it is the relative error that is much more useful than the 
absolute error since the absolute error does not convey to the concerned person 
the comparative/relative quality of the solution. 

The computational and storage complexities are as follows. To compute the 
(k + 1)st iterate x**+! in Barnes algorithm we need O(mn)* operations. The number 
of iterations has been found to be independent of the dimension (order) m or n 
of the matrix A through our numerical experiments. We do not, however, have a 
rigorous mathematical proof. Thus the Barnes algorithm is polynomial time and its 
computational complexity is O(mn’). 

As a matter of fact, since the Barnes algorithm is a variation of Karmarkar 
polynomial-time O(n3>) projective transformation algorithm, it seems to be poly- 
nomial time. The power of the dimension n, viz., 3.5 is much more important than 
the constant coefficient (independent of 1) of n>, 

We believe, through our numerical computations of the solution of LPs with 
linear constraints of different dimensions, that Barnes algorithm has a computational 
complexity O(n). We do not know at this time the exact computational complexity 
of Barnes algorithm in the mathematical/computational science sense. 
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Table 4 Detection for different type of LPs 


Primal LP Dual LP Solution type Procedure for detection of 
basic variables 


Non-degenerate | Non-degenerate | Unique After sufficient number of 
iterations, the m columns of A 
corresponding to m basic 
variables which have larger 
values are chosen and the 
resulting linear system with m 
x m non-singular coefficient 
matrix is solved. The 
remaining n-m non-basic 
variables are set to 0 


Non-degenerate | Degenerate Multiple having k > m_| If the algorithms converge to a 
variables positive solution with exactly m 
variables non-zero, then these 
are basic (not the case, in 
general). Otherwise detection 


is hard 
Degenerate Non-degenerate | Unique having k < m The basis here is not unique 
variables positive and may not be detectable. 


The k variables will produce 
the optimal solution allowing 
the remaining n-k variables 0. 
After a sufficient number of 
iterations, choose m columns 
that include the k columns and 
solve the resulting m x m 
system allowing the values of 
arbitrary m — k variables 0 


Degenerate Degenerate Multiple Detection is hard here too. For 
details of detection, see [20] 


Further, the solution of a linear system resulting from the detection of basic vari- 
ables through Barnes algorithm needs again O (mn?) operations. Hence the proposed 
iterative-cum-non-iterative algorithm requires O (mn?) Le. O (n°?) operations and 
hence is polynomial time. 

Our numerical experiment shows that O (we? ) Karmarkar algorithm, on the other 
hand, involves many times more iterations than those required by Barnes algorithm 
for a specified accuracy. We have not studied the relationship between the number of 
iterations in Barnes algorithm and those of Karmarkar algorithm for LPs of varying 
dimensions. It could be an interesting topic for further exploration numerically as 
well as mathematically. 

Further, determination of the relationship between the time complexities produced 
by computer with respect to both Barnes and Karmarkar algorithms for LPs of 
different size would be also interesting not only from sequential computation point 
of view but also from parallel computation point of view. Parallel version of Matlab 
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is becoming increasingly more popular among the laptop/desktop users who have 
multi-processor computing system. 

The storage complexity in Barnes algorithm is m(n + /) + m+n-+ / plus 
storage required by the formal code (program) of the algorithm. Thus it is much less 
compared to that in Karmarkar algorithm [32]. Since the (general) program needs a 
fixed number of storage locations irrespective of the problem size and this number 
is usually small compared to the size of a real-world LP (which is often large), we 
neglect the storage required by the program while computing the order of storage 
complexity. 

The conversion of an LP in an inequality form, viz., Minc'x s.t.Ax < b,x > 0 
to the Karmarkar form of LP (KLP), where the matrix A is m x n and consequently 
the vector x to be computed has n elements, increases the dimension significantly. 
The dimension m x n of A in LP increases to the dimension (m + n + 3)(2 m+ 2n 
+ 3) in KLP [14]. The dimension m of the right-hand side vector b becomes m + n 
+ 3 in KLP. 

Thus observing that the cost vector c in the foregoing LP has n elements, the 
percentage increase in storage location is 


% increase = 100[(m +n + 3)(2m + 2n + 3) — (mn +m +n)]/(mn +m +n) 


If m = 10 and n = 15 then the increase in the number of storage locations in KLP 
will be 764%, i.e. nearly 7.6 times that of the LP in inequality form! If m = 100 and 
n= 150 then this increase will be 734.49%, i.e. roughly 7.4 times that of the LP in 
inequality form. 

However, an intelligent implementation could reduce the storage requirement to 
an extent in both Barnes and Karmarkar algorithms. For large sparse problems 
where most of the elements of the matrix A are zero, a programming technique for 
sparse matrices may be superimposed on any of the algorithms. 

Matlab code We now present a Matlab program for the foregoing iterative-cum- 
non-iterative algorithm i.e. the proposed algorithm. Although the Matlab program 
can compute the optimal solution using entirely the Barnes algorithm, one may stop 
continuing the iterations once the monotonic convergence sets in and then detect 
those variables which tend to positive values different from numerical zero [20]. 
These variables are basic variables while other variables that tend to 0 are non-basic 
variables. 

We remove those columns from the coefficient matrix A, that correspond to the 
non-basic variables. We also remove the non-basic variables from the solution vector 
x (without changing the identity of the basic variables). The resulting solution vector, 
also called x, is computed using the Matlab command x= A\b. 

This computed solution vector along with the non-basic variables with values 0 
constitutes the required optimal solution if it passes the optimality test. The following 
Matlab program is not completely automated and not completely general or most 
efficient. 

One can make it so with some programming effort. Unlike the main-frame 
computing era during mid- and late-twentieth century when many users would be 
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using a single physically large computer in batches, each user has one high speed 
computer (laptop/desktop) always at their disposal. 

Thus one can inspect the outputs/results again and again and accordingly decide on 
the basic variables instead of computer performing this task based on the (additional) 
code supplied/implemented by the user (to make the program fully automatic). 

The Matlab program below is self-explanatory, needs no formal programming 
knowledge to follow, and can be easily modified by the user according to their need. 

It may be noted that the algorithms written in (formal) Matlab code is 
interpretation-independent (i.e. no 2 distinct interpretations are possible unlike 
sometimes those written in a natural language such as English) and non-ambiguous. 


function[] = Barnesbs(A,b,c, epsilon,c1,p) 

%LP: Min c'x s.t. Ax=b, x>=0 

% Method: Proposed algorithm with Barnes iterationt+optimality test 
%A =m xn coeff. matrix, b=rhs response vector, c=cost vector, 


“epsilon = exit parameter, cl= parameter, say cl=10, 

“added to coeff. of artificial var. 

% Parameter p is taken to be a large value, say 0.2, for general, 
“multiple solution, and infeasible LPs. For unbounded problems, p 
%is taken to be a smaller value, say 0.01 so that unboundedness 
“is detected before reaching the optimality test. 

% Algorithm - Barnes' iterative method plus optimality test 

% Get the size of the coeff. matrix A 

[m, n] = size(A), ctr=1, optsol=0, boolmul = 1, 

% Determine the size of the new coeff. matrix A and cost 
Yovector c when artificial var. is added to the system. 

nl=n+1; ml=m+1; 

% Initialize solution vector to zero 

xk=zeros(n1,1); phi = zeros(1, nl); 


% Assign cost vector c to the new vector cf,, which will be used 
“Yolater to compute final objective function value (ofv) when 
“solving the LP with only the determined/known basic vars. 
cf=c; 

% Determine the coeffs. of the artificial var. in matrix A 

for j=l:m 

A(j,n1)=b(j) - sum(AG,:)); 

end 

% Determine the coeffs. of the artificial var. in cost vector c 
e(nl1)=sum(abs(c))+c1; 

for i=1:nl 

xk()=1; 

end; 

f=c'*xk; total = sum(xk); 

%Do iterations in while loop till optimality test is satisfied 
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while 1 

% Initialize diagonal matrix with the vector xk 
Dk=diag(xk); 

% Determine lambda value 

lambdak = (pinv(A*Dk‘2*A')*A*Dk‘2*c); 

% Determine the projection vector 
beta=norm(Dk*(c-A'*lambdak)); 

% Compute xk(i)*(c(i) - A(:,i)'*lambda) for each var. 
for i=1:nl 

phi(i) = xk(i) * (c(i) - AC, i)'*lambdak); 

end 


% Compute the step length Rk for the iteration 
j=l; 

for i=1:n1, if phi@)>0 

del(j)=beta/phi(i); j=j+1; 

end 

end 

Rk=0.9* min(del); 


% Compute the new feasible solution xk 
xk=xk-((Rk* Dk*2*(c-A'*lambdak))/beta); 
fi=f; 

% Compute the new ofv 

f=c'*xk; total=sum(xk); 


%Find if LP is infeasible by checking the the artificial var. value 
if xk(n1) > 0.3 * total 
disp('The problem is not feasible’); break; 


% Arrange components of solution vector in descending order 
[X, Index]=sort(xk,'descend'); 


% Check artificial var. for the problem infeasiblilty 
if xk(n1)>p*total/n1 

disp('The problem is infeasible’); break; 

end 

% Check if problem is unbounded 

if (abs((fi*2-f*2)/(fi+f))>(p * abs(f))) 

disp('Check solns. for possible unboundedness'); 
boolmul = 0; 

end 

% Check if multiple solns. exist 

if X(m1)>p * total/n1 

disp('Check for possible multiple solutions '); 
boolmul = 1; 

end 

end 

j=1; k=1; B= []; cB=[]; Ind = []; Ind = []; Zk = []; 
% Form the basis B with the m largest vars. from soln. vector xk 
% (These are already stored in X) 

fori=1:m 

mer i) = A(:, Index(i)); cB()=c(Index(i)); Ind(i, :) = Index(i); 
en 
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fori=ml1:nl 

Ind1(j, :) = Index(i); j =j + 1; 

end 

% Determine the number of vars. converging to a positive value 
S = size(Ind); 


% Determine the soln. vector for the system comprised of basic vars. 


x0=B\b; r=min(x0); 

if SQ) <=-m && r>=0 

% Perform optimality check, i.e., for the nonbasic vars. 
“compute the reduced cost and check if reduced cost is <=0 
“for all nonbasic variables 

newcount = nl - S(1); 

for k= 1: newcount 

pj=AC,Ind1(k,:));ej=e(Ind1 (k,:));Zk(k)=((cB*inv(B))*pj)-cj; 
end 


rl = max(Zk); 

ifrl <=0 

“Yolnitialize a soln. vector of zeros with n vars. and 
“make the soln. vector complete by inserting appropriate 
Yovalues for basic vars. 

y = zeros(n,1); counter = 1; 

for j=1: SQ) 

y(Ind(j,:)) = x0(counter); 

counter=counter + 1; 

end 

if boolmul == 1 

disp('The optimal solution to the LP is '); disp(y') 
disp('Objective function value is '); disp(cf*y) 
disp('The number of iterations performed is '); disp(ctr) 
break; 

else 

if optsol == 

fopto=cf"*y; optsol = 1; 

else 

foptn=cf'*y; 

if foptn < fopto 

disp('Check solns for possible unboundedness');break; 
else 

disp('‘The optimal solution to the LP is '); disp(y') 
disp('Objective function value is '); disp(foptn) 
disp('The number of iterations performed is '); disp(ctr) 
break; 

end 

end 

end 

end 

end 

etr=ctr+1; 

end 


Numerical examples: Proposed algorithm 


LPI Min —x, — 3x2 s.t. 2x) + 3x2 +43 = 6, —xX) $+ x2 +44 = 1, x; > 0 for alli 
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Assuming that the code for the Matlab program Barnesbs(A, b, c, epsilon, c1, 
p) is already stored in the computer among the Matlab files, we give the following 
commands (after the prompt >) 


A = [23105-1101], b = [6 1]’, c= [-1-3 0 0]’, epsilon = 10°-5, cl = 10, p = 
0.2, Barnesbs(A, b, c, epsilon, c1, p) 


in | line in the command window and execute the program. The outputs that we get 
are: 

The optimal solution to the LP is [0.6 1.6 0 0]', Objective function value (ofv) is 
—5.4. The number of iterations performed is 1. 

LP2 Min —3x,; — 4x2 — x3 — 2x4 + 2x5 — x6 subject to x) —2x3+x%4-—X6 +x7 = 
2.5,x, — 2x2 + 2x4 — x5 +X6 —X7 = 6.5, x4 +45 + X65 = 1x2 4+23=1,x; > 0 
for all i. 

Entering the following commands (after the prompt >>) in the command window: 

A=[10-210-110;1-202-11-10;00011100;01100000), b= [2.5 
6.5 11)’, c = [-3 -4 -1 -2 2 -10 0)’, epsilon = 10°-7, cl = 10, p = 0.2, Barnesbs(A, 
b, c, epsilon, c1, p) 

we obtain the outputs (omitting some of the parts) as follows 


LEVI OST EO. ws 
ek ea es Bee ; 

OOO Me hea 1 ee 
011000 00 1.0 


epsilon = 1.0000e-007, cl = 10, p = 0.2000. 


Warning: Matrix is singular to working precision. 
> In Barnesbs at 95 

Warning: Matrix is singular to working precision. 
> In Barnesbs at 102 


Warning: Matrix is singular to working precision. 

> In Barnesbs at 102 

The optimal solution to the LP is x = [ 2.5 010010 0] 
Ofv is —18.5000. (The number of iterations performed is 67.) 


t 


To conserve space we have included only a couple of examples here although we 
have experimented/tested our algorithm on many LPs of each kind such as general 
LP, unbounded LP, multiple solution LP, near-multiple solution LP, degenerate LP, 
infeasible LP, and LP that involves cycling in the simplex algorithm. 

Besides, the experiment was also conducted on relatively large LPs with random 
matrices. The value of the parameter p in the Matlab program plays an important 
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role in reducing the number of iterations in the Ist part of our main algorithm, i.e. in 
Barnes algorithm. The following computer outputs are self explanatory. 

In most such Matlab outputs there is no provision for printing superscripts and 
subscripts in two or more levels. We have tried, as far as possible, to retain the 
computer outputs as they are without altering them. This is essentially to avoid 
introducing human errors due to us. We believe that the reader will have no difficulty 
to easily comprehend the computed results. 

In each example the LP is followed by the Matlab command that will execute 
the foregoing Matlab program (code). In some examples, we have omitted several 
portions of the computer output to save space. It may be seen that the vectors b, c, 
and xk are all column vectors although these have been written in a row to conserve 
space. 

Detection of basic variables In the proposed algorithm, there are two ways of 
sieving out the basic variables. The Ist way is to choose m variables which tend to 
a positive value v, where [20] 


n+l 


v>(p* >) x)/M+ 1), 


i=1 


the parameter p is generally chosen to be 0.2. 

The 2nd way is to choose the m largest variables occurring in the solution vector. 
The Ist way of detecting basic variables and constructing a linear system comprising 
of only the basic variables does not guarantee a square system. 

In such a case, the inv command in Matlab used for the optimality test, would 
produce an error. The parameter p then needs to be increased or decreased in order 
to make exactly m number of variables to be basic. 

We observed that for a degenerate LP the parameter p had to be increased to 0:3 
whereas in the case of a multiple solution LP the parameter had to be set to 0:01 to 
make exactly m variables basic. The Matlab code detects basic variables using the 
2nd way. In this manner of implementation, the parameter p has no role to play. 

The optimality test in our Matlab implementation starts from the very Ist iteration 
of Barnes algorithm. The user can modify the code to start the optimality test after 
a few iterations (say k = 5 iterations). 

As stated earlier, we have experimented with a large number numerical LPs such 
as those including multiple, degenerate, cycling (occurring in a simplex method), 
and random coefficient matrix LPs. The results were satisfactory. We, however, have 
included a couple of LPs to conserve space. 

In our numerical experiments over various types of numerical LPs, we have 
compared the time complexities (proportional to computational complexities) of 3 
programs viz. the Barnes, our proposed, and Matlab linprog (based on the primal-dual 
simplex method) programs. 

We have found that the proposed algorithm is competitive with both Barnes and 
Matlab linprog algorithms. 
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For general LPs we have observed that the time complexity of the proposed algo- 
rithm is less than that of linprog. The optimal solution obtained by both proposed 
algorithm and linprog correspond to a basic solution in all cases unlike Barnes algo- 
rithm which in some cases produces a nonbasic solution as the optimal solution 
(having more positive variables than the number of basic variables). 

In general, the optimal solution obtained using proposed algorithm and linprog 
are more accurate compared to that obtained using Barnes algorithm. 


3 Solving a Nonlinear Equation in One Variable Having 
Root-Clusters and/or Multiple Roots 


The most challenging issue in solving a nonlinear equation in one variable by any 
method arises when the roots are clustered, i.e., closely spaced. Some methods 
perform better than some others, but all the methods have varying degrees of accu- 
racy under a fixed precision. Usually in almost all computational environments, all 
computations in an algorithm are carried out using a standard precision, say 15 signif- 
icant digits and not using variable precisions. So, we assume that our precision of 
computation is fixed for all computations and for all algorithms. 

For a multiple, i.e., a repeated root, the popular Newton method will oscillate 
around the root without converging to the root. The Newton scheme to compute a 
root of the nonlinear equation f(x) = 0 is 


xian1=xi4 Pei) i=0,1,2,3,-:-, tif] = | < 0.5 x 10-4 ori = 30(say), 
f' (i) Xi41 


where xo is a specified numerical initial approximation (reasonably a good one). 
When the iteration scheme converges and the relative error || satisfies the 
foregoing inequality constraint, we get the root correct up to four significant digits. 
It may be seen that we should not use the absolute error constraint, viz., |xj41 — x;| 
instead of the relative error constraint. In all numerical computations, it is the relative 
error that is most important to determine the quality of roots/solution. The absolute 
error is, in general, not useful. 

For a repeated root, f(x;) in the Newton scheme converges to zero faster than 
f'(%;) when f(x;) is a polynomial, i.e., a polynomial of finite degree which is a 
positive finite integer. Observe that both the numerator /f(x;) and the denominator 
J‘ (%;) will be exactly zero at a repeated root. No computational overflow or division 
by zero will result because of faster rate of convergence of the numerator f (x;). For 
most small problems with reasonably high precision computation, we will get four 
significant digit accuracy by the foregoing Newton scheme without any modification 
in it. For example, consider the polynomial equation f(x) = x? — 3x* + 3x — 
1 = 0. This equation has three repeated roots, viz., 1,1, and 1. If we take initial 
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approximation, i.e., the zeroth iterate x9 = 5 (not a good approximation), then the 
following Matlab program (script file) >> newtonrepeatedroot 


format long g; x=5; 
for i=1:30, x1=x-(x“3-3*x%2+3*x-1)/(3*x%2-6*x+3); 

if (abs(x1-x)/abs(x1))<= 0.5*10%-4, i, x=x1, break; else x=x1, end; 
end; 


will produce the successive iterates as 

3.66666666666667(first iterate), 2.77777777777778, = 2.18518518518519, 
1.79012345679012, 1.52674897119342, 1.35116598079561, 1.23411065386374, 
1.15607376924249, 1.104049179495, 1.06936611966335, 1.04624407977559, 
1.03082938651708, 1.02055292434476, 1.01370194956347, 1.00913463304184, 
1.00608975536149, 1.00405983690544, 1.00270655794857, 1.00180437198957, 
1.00120291466073, 1.00080194312401, 1.00053462850456, 1.00035641803874, 
1.000237612102, 1.00015841052562, 1.00010560237306, 1.0000703997415 (27th 
iterate). 

Since the relative error abs (x1-x) /abs (x1) = 1.00004689358746 < 0.5 x 
10-*, we have obtained the required root 1.0000703997415 at the 27th iterate. 
Observe that there is an oscillation in successive iterates. To obtain the other two 
roots, we may deflate the polynomial using the computed root and obtain the other 
two roots. However, this deflation of the cubic polynomial to quadratic polyno- 
mial will involve numerical error which may be unacceptable, particularly for many 
repeated roots. 

If we replace 30 by 100 and 0.5*10°-4 by 0.5*10%-8 in the fore- 
going Matlab program newtonrepeatedroot, we then obtain the computed root 
1.0000071105183 at the 34th iterations This computed root is correct up to 8 
significant digits. 

A better way is to employ the deflated Newton method [12] to obtain the repeated 
roots. This method produces the root much more accurately than that produced by 
the aforesaid Newton method. One should use this deflated Newton method and not 
the Newton method when more involved multiple root problem crops up with respect 
to a polynomial. 

A symbolic-cum numerical Newton scheme such as the one (Newton6.m) 
below could be more accurate in computing a zero of a univariate function having 
zeros. The first order derivative is exactly automatically computed symbolically by 
the Matlab command diff(f) 


function[]=newton6(f, x); 
%Take (e.g.) f as x*5-15*x%4+85*x%3-225*x%2+274*x+20 and x as 2.3 


fp=diff(f), 
disp(‘iteration x-value f-value__ rel_error') 
% Enter above line with appropriate no. of spaces 


iteration=0; 
disp([iteration x eval(f)]); 
for i=1:40 
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iteration=iteration+1; 
x1=x-(eval(f)/eval(fp)); 
rel_error=abs(x1-x)/abs(x1); 
if(rel_error)>=.5*10%-10, 
x=x1; fvalue=eval(f); disp([iteration x fvalue rel_error]); 
else break; end; 
end; 
x=x1, fvalue=eval(f); disp([iteration x fvalue rel_error]); 


%To execute newton6, enter into Matlab command window the line 

% 

“%oclear all, format long g; syms x, double x, newton6(x*5-15*x*4+85*x43- 
225*x42+274*x+20, 2.3) 

% Enter above 2 lines (without %) as 1 line in Matlab command window 


The output produced is 


iteration x-value f-value rel_error 

0 2.3 138.74693 

1 66.8484670853657 1059789337.0902 0.965593825852986 
2 54.0850389682376  347245154.407416 0.235988146826031 
3 43 875861344128 113771678.226548 0.23268324111148 
4 35.7104729654545 = 37273747.3154274 0.228655285147653 
5 29.1805983823048 12210314.8429822 0.223774526402767 
6 23.9597286522407 = 3999275.56886078 0.217901872172325 
7 19.7867822 148566 1309576.78985901 0.210895657114521 
8 16.4530170597472 428674.689020384 0.202623333033887 
9 13.7915106722633 140255.680289789 0.19298 149787438 
10 11.6686107873019 45867.0853336381 0.181932530243583 
25 -0.422225939675234 -142.690011119076 1.28254934854888 
26 -0.144692873642574 -24.6205817776276 1.91808386305349 
27 -0.073253169203583 -1.31257008420433 0.975243872936723 


28 -0.0689964979543511 -0.00441721870155831 0.0616940189058311 
29 -0.068982075898359 = -5.05480635126787e-008 0.0002090696 14154032 
30 -0.0689820757333181 3.5527136788005e-015 — 2.3925250139224e-009 


X = -0.0689820757333182 


31 = -0.0689820757333182 -3.5527136788005e-015 2.0117962036204e-016 

Remarks The 31st iteration values are required to compute relative error in the 
solution/zero x of the equation at the 30th iteration. The zero computed here is 
sufficiently accurate. To compute the other 4 zeros, we may deflate the polynomial 
dividing it by the linear factor x-(-0.0689820757333182). The resulting 4th degree 
polynomial is then allowed to replace the 5th degree one in the foregoing newton6.m 
program. 
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For the derivative of a transcendental function e.g. sin(x) at x = pi, we may write 
the commands in the Matlab command window 


> clear all; close all; syms x; f = sin (x); d = diff(f), x = pi, value = eval(d) 


which produce d = cos(x), x = 3.14159265358979, value = —1. This symbolic 
derivative command may be used appropriately in newton6.m to obtain | zero of 
the given transcendental function. For further information, refer [12, 26, 27]. 

If we use a method which involves only the computation of the function (poly- 
nomial) and no division by its derivative(s), for example, the simplest and yet very 
effective successive bisection method for real as well as complex zeros [28], the 
multiple root problem such as oscillations and consequent loss of significance will 
not crop up. 

Root-clusters Consider a 10th degree polynomial having 10 zeros 1.001, 1.002, 
1.003, 1.004, 1.005, 2.001, 2.002, 2.003, 2.004, and 2.005. The constructed 
polynomial using Matlab commands 

>format long g; V = [1.001 1.002 1.003 1.004 1.005 2.001 2.002 2.003 2.004 
2.005]; A = poly(V) 

is 


1x! — 15.03x? + 100.405395x® — 392.404743x7 + 993.214606514523x° 
+ 1700.80200508075x° + 1995.37494135908x* — 1583.78804734281x° 
+ 814.143990193428x7 — 244.841176478389x + 32.7270388349239 = 0 


which is approximate. The vector A has the 11 elements 1, —15.03, --- + 
32.7270388349239. The roots of this polynomial equation are produced by the 
Matlab command > > roots(A) as 


2.01026595993287 

2.00526396465983 + 0.006649835895433361 
2.00526396465983 — 0.00664983589543336i 
1.99710305546508 + 0.00413235701062952i 
1.99710305546508 — 0.00413235701062952i 
1.00586076903522 + 0.00172652371275072i 
1.00586076903522 — 0.00172652371275072i 
1.00189549791341 + 0.002755368649 170251 
1.00189549791341 — 0.002755368649 170251 
0.999487465920064 


Observe that out of these 10 roots 8 are complex conjugate roots while 2are real 
roots. Clearly such results may not be acceptable. One way to reduce error to some 
extent is to use Matlab vpa commands 


> format long g; V = [1.001 1.002 1.003 1.004 1.005 2.001 2.002 2.003 2.004 
2.005]; A = vpa(poly(V), 30) 
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These produce the vector A (after copying and pasting omitting multiple blanks 
before the first element of A in the Matlab output) as 


A = [1., —15.0300000000000000000000000000, 
100.4053949999999986 17568053305, —392.404743000000053 143594413996, 
993.2146065 14522984070936217904, —1700.802005080753588 11765909195, 
1995.37494135908013959124218673, —1583.78804734281175115029327571, 
814.143990193427725898800417781, —244.841176478389172643801430240, 
32.7270388349238956493536534254]; 


The next Matlab command >>roots(A) produces slightly better roots which are 


2.0065 1472714862 

2.00395604784085 + 0.002689546616422781 
2.00395604784085 —0.002689546616422781 
2.00028658848267 + 0.0014898027864992i 
2.00028658848267 —0.0014898027864992i 
1.00737368656599 

1.0043 1902401105 + 0.003685073798616191 
1.0043 1902401105 —0.003685073798616191 
0.999494 132808125 + 0.00223956079445499i 
0.999494 132808125 —0.00223956079445499i 


Even these 10 roots may not be acceptable in several real world applications. 
Observe that the Matlab command >> vpa(roots(A), 30) does not help to get still 
better roots. Yet another way to obtain these root-clusters is to use integer arithmetic 
[26]. For small root-clusters problems like the foregoing one, the exponential-time 
method described in [26] is good. 

System of nonlinear equations For a system of nonlinear equations involving 
both polynomial (algebraic) and transcendental equations, one may use a refining 
technique [1-3] that involves essentially the computation of the left-hand side of 
each of the equations after computing an approximate solution vector. The refining 
technique may be iteratively deterministic or stochastic. However, solution clusters 
problem continues to remain in a more or less pronounced manner. 

Error computation in a solution of linear/nonlinear system needs norm We the 
common human beings cannot readily say which solution vector out of several solu- 
tion vectors is the best. We can easily compare a scalar real solution of a single 
equation out of several scalar real solutions very easily and can readily say which 
scalar solution is the best or a best. Hence we use norm—usually Euclidean norm— 
of the solution vectors, which is a scalar. Then we compare the solution vectors and 
easily determine the best one. 


Numerical Computation in Spiky Domain 43 


4 A Sensitive Nonlinear BVP: A Model for Gas Flow 
Through a Porus Medium 


The modeling of the unsteady isothermal flow of a gas through a semi-infinite porous 
medium results in a sensitive nonlinear two point boundary value problem (BVP) 
over an infinite interval for which the lower and upper solutions have been established 
analytically [29-34]. An interactive solution procedure is presented based on these 
lower and upper solutions (LUSOL) for this BVP whose conversion to an initial 
value problem (IVP) without making use of the LUSOL is usually not numerically 
possible [29]. In this context, the gas flow problem considered here is an excellent 
example where the knowledge of easily computable LUSOL is important if one does 
not want to land in a situation in which all the non-LUSOL based methods fail or 
miss the required solution. 

Since mid-1980s, boundary value problems (BVPs) arising in diverse fields such 
as diffusion of a gas through a porous medium, heat conduction in the human brain, 
thermal self ignition of achemically active mixture of gases in a container, diffusion of 
heat produced by temperature-dependant sources, catalytic theory, adiabatic tubular 
reactors, oxygen diffusion in a cell with Michaelis-Menten kinetics, fluid dynamics, 
and electrical potential theory, have been studied [30-34]. 

Consider the unsteady flow of a gas through a semi-infinite porous medium 
initially filled with gas at a uniform pressure po > 0 at time t = 0. The pressure 
at the outflow face is suddenly reduced from po to py > 0 (p; = 0 in the case of 
diffusion into a vacuum) and is thereafter maintained at this lower pressure. The 
mathematical problem of the unsteady isothermal flow of gas is a nonlinear partial 
differential equation [34, 35]. This model after some mathematical derivations turns 
out to be, in one dimension, the two point BVP associated with the second-order 
nonlinear ordinary differential equation (ODE) with the parameter 0 < 4 < 1: 


2t 
V1l—dAy 


where the foregoing mathematical boundary conditions! (BCs) y(0) = 
land y(oo) = 0can be written as the numerical BC y(0.0001) = 1, y@G.1501) = 0 
in this problem. The oo here is any numerical value >3.1501 so far as the usable 
numerical accuracy in the real world situation as well as the available standard preci- 
sion (15 significant digits in Matlab) are concerned. In fact, 0.005% accuracy or, 
equivalently, 4 significant digit accuracy is sufficient for almost all real world prob- 
lems since any measuring device (electronic, optical, or otherwise) cannot usually 
measure any quantity with more than 0.005% accuracy nor can we implement in real 
problems more than 0.005% accuracy in any meaningful way [5, 29]. 

One might think of solving the problem (2) in one of the two ways—lower and 
upper solutions (LUSOL) based way and non-LUSOL based way. The non-LUSOL 


y+ y'=0, 0<t<oco withy(O) = land y(cw) =0 (2) 


'By y(co) throughout this chapter, we will imply lim y(a) for real a. 
a—>oo 
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based way includes methods such as (i) finite difference based methods in which 
finite difference nonlinear equations are set up and then numerically iteratively solved 
making use of the BCs, (ii) the Mean Value theorem based method [36], (iii) the inter- 
polation based method [37]. In all these methods, the knowledge of analytical LUSOL 
is useful but not essential. The later two methods convert a two-point nonlinear BVP 
to an initial value problem (IVP) iteratively, which is then solved noniteratively. 
The reader may also refer [12] for nonlinear two point boundary value problems in 
general. 

The LUSOL based way may convert a nonlinear BVP to an IVP using the fore- 
going Mean value theorem or extrapolation-free successive interpolation iteratively 
and then solving the IVP noniteratively [36]. Or, it may simply use an IVP solver 
interactively, i.e., using a combination of artificial and natural (human) intelligence. 
This combination involves a simple computer program, a computer with the requisite 
software, and human inspection (of the output) and decision (in a semi-automated 
way). 

Ideally a comprehensive completely automated computer program that 
straightway produces the required solution is desired. However, in the foregoing 
gas flow problem such an automated way involves fairly complex intelligent deci- 
sions to be made by the computer and the writing of the concerned code (program) 
would need much more than reasonable amount of time and still may not be free 
from bugs arising out of unforeseen situations (not considered by a programmer in 
the program). 

Our motive/goal is to get the required solution of the BVP. Whether we achieve this 
by an artificial intelligent way (entirely by a large complex computer program) or by 
a combination of natural and artificial intelligent ways that involve human-computer 
dialogue is immaterial. The artificial intelligent way, i.e., completely an automated 
comprehensive computer program, may take unacceptably large code writing time 
and may involve bugs due to unforeseen situations as well as due to increased length 
of the code. 

The artificial-natural intelligent way, on the other hand, needs very little program- 
ming effort as the program is a physically small one just to solve a nonlinear IVP non- 
iteratively for which Matlab or any other very high-level programming language can 
be readily used. Such a programming language needs practically no formal program- 
ming knowledge. The concerned human programmer/user gets valuable information 
from the step by step integration of the [VP and make use of the information to set up 
the next better [VP and continue the process using his own natural intelligence and 
making ready decision. In this interactive procedure there is no chance of missing 
the actual solution because of the already known LUSOL [29]. 

In the concerned gas flow problem (2), we know after seeing the Matlab graphs of 
the lower and upper solutions, that at the oo end, we get both y(oo) = Oand y’(oco) = 
0. These conditions readily tell us that the BVP (2) degenerates to just an IVP and 
we may solve this IVP noniteratively starting from the oo end. Is this really possible? 
The answer is “no’. The reason is that the speed at which y goes to 0 is significantly 
different from that at which y’ goes to zero while we proceed towards oo (numerically 
here, say 3.1501). 
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For } = 0.9, we have numerically, at ¢ = 3.1501, y@.1501) = 
0, y'(3.1501) =  -—0.0000518 that produces the IVP with the required 
numerical solution. The conditions y(3.1501) — 0, y’G.1501) = 


—any other significantly dif ferent numerically small value, on the other 
hand, will produce an unacceptable result. If, for example, A = 0.9, ¢ = 3.1501 : 
—0.0158 : 0.0059 : and y(3.1501) = 0, y’(3.1501) = —0.00000001, and solve the 
resulting IVP we will get y(0.0059) = 0.0001, (y’ (0.0059) = —0.0001) which is 
not anywhere near y(0) = | and thus completely wrong and so unacceptable. 

This speed difference makes this unsteady sensitive gas flow problem numeri- 
cally very interesting and the analytical LUSOL seems to be an excellent tool that 
will refrain us from drifting away from the true solution and from prompting us to 
conclude that a stable numerical solution is not obtainable numerically. 


4.1 Lower, Upper, and ODE Solutions and Matlab Program 


We present in this subsection, for different values of A, the analytical expression of 
the lower and the upper solutions [29]. Also, we present the ODE with its BCs and 
probable ICs and focus on the advantage of combining human inspection/judgement 
and a concise simple Matlab program (just for an IVP). For the nonlinear two-point 
BVP (2), the analytical lower solution a(t) and the analytical upper solution B(t), 
called here LUSOL are given by [33] 


t 
2(1 —a)- "4 


a(t)=1 ae * 
0 


t 
= 2. z 
—(1-a)7 1/2 x? = x 
e dx, p= 1-— fe dx (3) 
W 
vay 


where 6(t) = a(t) at A = O which implies that for A = 0, the actual solution A(t) of 
the BVP (1) will be very close to 6(t) That is, 


y(t) ¥ Bt) =1- —S | e*'dx (4) 


The solution y(t) is not exactly equal to 6(t) because the BVP (2) is defined only 
for 0 < 4 < 1 and not for A = 0 In fact, a(t) < y(t) < A(t) forO0 < t < w 
implying that the solution space defined by y(t) for various values of A € (0, 1) is 
strictly open on both ends of tf. 

The second order ODE with the BCs in (2) is written as the following system of 
two first order ODEs with two BCs: 


dy, dy» 2t 


——= fit, 1, y2) = ya. — =P») = —_———», 
A Sit, M1, Y2) = y2 e Sot, yi, y2) i 
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yi(0) = I, yi(oo) = 0 (5) 


where 0 < 4 < l,y = y1,y2 = dy,/dt = y;. One may attempt to solve the 
nonlinear ODE (5) iteratively by methods described in [36, 37] without possibly 
making use of the knowledge of the mathematically derived lower and upper solutions 
defined in (3). Such an attempt may not often succeed for this particular gas flow 
problem (5). In fact, we are likely to miss the actual solution and might deviate from 
it without often any chance of returning to it. 

From the graph of the lower and upper solutions obtained using Matlab, we get 
valuable information about the BVP (5). It is readily clear from the graph that the 
initial conditions (ICs) at the co end are 


y(0o) = yi (00) = 0, y'(00) = yn(00) = 0 (6) 


That is, the two-point BVP (5) reduces to the initial value problem (IVP) when 
the BCs in (5) are replaced by the ICs in (6). The IVP is then 


hymn 2 = Alt. yy) =- 
dt a it > ¥1, 2 = y2, dt =/2 > ¥1, Y2 —_ aaa 
yi(0o) = 0, y2(oo) = 0 (7) 


The oo in ICs (6) turns out to be around 3.18 for all practical purposes. The 
solution of this [VP will not be able to get out of zero if we use exactly the ICs (6) 
replacing oo by, say 3.1501. Instead of zero in (6), if we use a small arbitrarily chosen 
nonzero value in (6) and replacing oo by 3.1501, we will be able to get out of zero 
but the solution will be in general wrong. 

Although both y(t) and y(t) will be zero at oo, they approach towards zero at 
different speeds as t goes toward oo and hence we will not know at t = oo © 3.1501 
the value of y and the corresponding sufficiently accurate value of y’. Under these 
circumstances, human-computer interaction along with the knowledge of the lower 
and upper solutions turns out to be an excellent way of tracking the actual numerical 
solution of the gas flow problem. 

We are implicitly accustomed to the environment that we run a computer program 
and obtain the required solution straightway in one run. We do not expect to run a 
program more than once and inspect the result of each run and then modify the 
concerned input parameter(s) for the next run. 

Running a program more than once for the sake of the required solution and for 
the sake of not missing one is definitely a very easy option and in this particular 
situation it appears to be attractive and almost unavoidable. One may argue why 
not automate even this inspection and run the resulting program only once. From 
artificial intelligence point of view this is possible but the natural intelligence (human- 
computer interaction) in this respect has an edge. 

Further, to automate, sometimes we need to spare enormous programming effort 
and consequent excessive time of writing codes and debugging. Unforeseen bugs 
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can still be there and that may be discovered in future by studying some suspicious 
output produced by a run of codes for some problem (by some user somewhere) [29]. 

Hence we simply write a small Matlab program for the lower and upper solutions 
and also for the actual numerical solution of the ODE (IVP) and run it a few times. 
Inspect the solution and modify the input parameter, viz., y2(3.1501) after each run 
to get the required solution of the IVP (7) replacing oo by 3.1501. In this venture 
the lower and upper solutions have an unavoidable role. These tell us whether our 
choice of the value of y’(3.1501) = y2(3.1501) is appropriate or not and what 
corrective measure(s) we should take for this value. Observe that we have kept 
y(oo) © y(3.1501) = y2(3.1501) = 0 always as this condition is specified in the 
problem. 

We obtain the lower bound of y’(co) = y2(oo) & y’(3.1501) = y2(3.1501) from 
the lower solution a(t) using the forward difference. Similarly, we get the upper 
bound of y’(oo) = yo(oco) & y’(3.1501) = y2(3.1501). These bounds are shown 
in the Matlab program as ydl and ydu and we consider the computation of these 
bounds reasonable as we do not have any other better way to do so [29]. 

The Matlab program The following Matlab program provides us with the lower 
and upper solutions, their corresponding graphs, the actual numerical solution of the 
IVP (7) with y, (3.1501) = Oand y2(3.1501) chosen as a small negative value. From 
the LUSOL that encompass the true ODE solution, it can be seen that the solutions 
are monotonically decreasing. This decrease implies that the first derivative, viz., 
y(t) = yo(t) is necessarily negative for all finite values of t. 

For a fixed value of A and for t = 3.1501 : —0.001 : 0.0001, we run the Matlab 
program a number of times. Each run takes just a few seconds. We inspect the 
value of y; (0.0001) which should be 1 after each run and then modify the values of 
y1 (3.1501) and y2(3.1501) accordingly. 

Within a couple of minutes, we will get sufficiently accurate values of 
y1 (3.1501) and y2(3.1501) that would result in y(0.0001) numerically very close to 
1. Observe that writing complicated lengthy Matlab program for complete automa- 
tion (no human-computer interaction), is not only more expensive but also may miss 
situations which can be seen by a human more easily in an interactive mode. The 
Matlab program [29] is as follows. 


%REMEMBER TO CHANGE (i) VALUE OF LAMDA IN ydash PROGRAM 
function[]=lusolgas (lamda, dt, yO); 

format long; 

%Take a real number for lamda in (0, 1), say lamda =0.9. 

%dt = interval in independent variable t 

%y0=[y1, y2]=[y(3.1501), y'(3.1501)]=[0, y'(3.1501)] 

%where a (negative) value of y'(3.1501) lying between lower value of 
%y'(3.1501) and upper value of y'(3.1501) needs to be supplied. 
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L=lamda; ‘lamda =", L, 

p=sqrt(pi); LM=1-L; c=1/sqrt(LM); 

c1=2*(LM4-.25)/p; 

tspan=.0001:dt:3.1501; %tspan=3.1501:-.001:.0001; or %tspan=.0001:.001:3.0501; 
tsize=size(tspan);ntstep=tsize(1,2); 

for i=1:ntstep, t=dt*i; ff(i)=quad(@(x)fd(x, c),0,t); alp(i)=1-c1*ff(i); end; 
plot (tspan,alp,'b:'), 

hold on; 

p=sqrt(pi); f='exp(-x.%2)'; 

for i=1:ntstep, t=dt*i; ff(i)=quad(f,0,t); beta(i)=1-(2*ff(i)/p); end; 

plot (tspan,beta,'k--'), 


ts=.0001:50*dt:3.1501; j=0; for i=1:50:3151, j=j+1; al(j)=alp(i);bet(j)=beta(i); end; 


%Miask the above line by putting a % just before ts if 3150 lines of output are required 
" t  lowersol upper sol', 

%[tspan' alp' beta'] % Keep this line if 3150 lines of output are required. 

[ts' al’ bet'] 


ydu=(beta(ntstep)-beta(ntstep-1))/dt; ydl=(alp(ntstep)-alp(ntstep-1))/dt; 
'ydash(3.1501) lies between ', [ydu ydl] 


hold on; 

t=tspan; t=3.1501:-dt:.0001;tspan=t; 

[t, y]=ode23('ydash',tspan,y0);soln=[t,y]; 

j=0; for i=1:50:3151, j=j+1; tt(j)=t(i); yy(j)=y(i, 1); yyy(j)=y(i, 2); end; sol=[tt' yy' yyy'], %Prints only 63 
values. 

plot (tspan, y(:,1), 'r-.'); 

legend('dot implies lower soln.','dash implies upper soln.', 'dashdot implies ODE soln.'); 


function ydash=f(t, y); 
ydash=[y(2); -2*t*y(2)/sqrt(1-.9*y(1))]; % lamda has been taken as .9 


4.2 Numerical Results and Their Graphical Visualization 


We discuss the numerical results for A = 0, 0.3,0.99 keeping At = dt = 0.001 
always. We have also produced results for several other values of A [29]. But we are 
omitting these results since these are not extreme solutions that provide information 
significantly different from the extreme As as above. This omission will conserve 
space. However, one can easily run the Matlab program with appropriate input data 
a few times to get the desired solution quite comfortably and confidently [29]. 


(i) Parameter  ~ (0. The lower solution a(t) in (3) almost coincides with the 
upper solution 6(t) in (3) and hence the solution of the BVP (5) is y(t) © B(t), 
where a(t) = B(t) when A = 0 exactly. The solution can be visualized through 
the graph in Fig. 2. Here y’(3.1501) is taken as 0.0000565 which is close (but 
not exactly falling within the numerical interval whose length is 0) to the value 
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Fig. 2 4 = 0.0. At t¢ = 3.1501 the initial condition used is yop = ly, y] = [y y2] = 
[0 —0.0000565]. The dashdot line representing the solution of the BVP (5) is practically enclosed 
between the ‘dot’ line representing the lower solution and the ‘dash’ line representing the upper 
solution (as it should be). The graph, as expected, depicts that all the three lines overlap one another 


of y’(3.1501) € [—.5518 x 10-4, —.5518 x 10-4] computed from the lower 
and upper solutions. 


Numerically we expect to have y’(3.1501) = -0.00005518 derived from the 
LUSOL (3) as a better approximation than y’(3.1501) = 0.0000565. However, the 
approximation y’ (3.1501) = 0.0000565 which is quite numerically close is justifi- 
able because of the fact that the parameter A in the ODE (5) is defined not for A = 0 
but for A > 0. By issuing the Matlab commands 

> clear all; lusolgas (0, 0.001, [0, -0.0000565]) 


we get the lower numerical solution a(t) collapsed onto the upper numerical solution 
B(t) and the ODE (5) solution y(t) almost grazing the LUSOL. Hence the condition 
at tf = 00, i.e., numerically in this given gas flow problem, at t = 3.1501, is yo = 
[y.»’] =[¥1 2] = [0 —0.0000565]. 

This condition produces y(0.0001) = 1.00388068043053 which has the absolute 
error 0.00388068043053 that is also the relative error [5]. For all practical imple- 
mentations, this result is good enough. For the foregoing value, viz.,y’(3.1501) = 
—0.0000565, the LUSOL, and the actual solution of the BVP (4) can be visualized 
from the following graph (Fig. 2) in which A = 0.0 and at t = 3.1501 the initial 
condition used is yo = Ly, y] = [v1 y2] = [0 —0.0000565 ]. 


Gi) A = 0.3,At = dt = 0.001,1C : yo(oo) ® yo(3.1501) = 
[0 y’(3.1501) value] Using the Matlab programs lusolgas and ydash through 
the Matlab commands 
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Table 5 4 = 0.3, y’ € [—0.5518 x 10-4, —0.0863 x 10~“] att = 3.1501 


IC: yo = [y y’] =I[y1 y2] att = 3.1501 Corresponding Ly y] att = 0.0001 

[ 0 ~0.000050 | [0.92848377710353 —1.07734935909093] 
[ 0 —0.000055 | [1.02780468112015 —1.19657160136354] 
[ 0 —0.000054 | [1.00784910878842 —1.172547989 13937 
[ 0 —0.000053 | [0.98794076063587 —1.14861612729507 
0 ~0.000038 | [1.00386332953910 —1.16775394479522 
[ 0 —0.0000537 | {1.00187110452635 —1.16535825125438] 
[ 0 —0.0000536 | [0.99987932212003 —1.16296344222142] 
0 —0.00005360605878832885] [0.99999998741054 —1.1631085 1347700] 
lobtdined by linear interpolation) 


>-clear all; lusolgas (lamda, dt, yO) 


where lamda = 0.3, dt = 0.001, yO = [0, a value between lower and upper values of 
y'(co) & y'(3.1501) computed from the given analytical lower and upper solutions 
using the Matlab program] we obtain a solution for each numerical value of y’ (oo) * 
y’(3.1501). Table 5 is the result of several runs of the Matlab program lusolgas just 
by varying the value of y’(co) & y’(3.1501) = y2(3.1501) within the interval 
[—.5518 x 10~*, —.0863 x 10~‘] so that we get y(0.0001) © 1 [29]. 

Hence the condition at tf = oo, ie., numerically in this given gas flow 
problem, at f = 3.1501, is yo = [y, y’] = Liy2] = [0 —0.0000536]. This 
condition produces y(0.0001) = 0.99987932212003 which has the absolute 
error 1.206778799700548e-004 that is also the relative error [5]. For all practical 
implementations, this result is good enough. 

The lower and upper solutions, and the actual solution of the BVP (4) can be 
visualized from the following graph (Fig. 3) in which 4 = 0.3 and at tf = 3.1501 the 
initial condition used is yo = Ly, y'] = [yy yr | = [0 —0.0000536 J. 

However, by the linear interpolation, 


y'(3.1501)bettrer = [(@3 — 02)/(@1 — 2) 181 + [(@3 — 01)/(@2 — 01) 182 (7) 
we obtain, at ¢ = 3.1501, 
yo =[y, »'] =[y y2] = [0 —0.00005360605878832885 ]., (8) 


where 
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Fig. 3 A = 0.3. At ¢ = 3.1501 the initial condition used is yp = ly, y] = [y y2] = 
[0 —0.0000536]. The dashdot line representing the solution of the BVP (5) is enclosed between the 
‘dot’ line representing the lower solution and the ‘dash’ line representing the upper solution (as it 
should be) 


ay = 1.00187110452635, a2 = .99987932212003, a3 = 1, 6) = —.0000537, B2 = —.0000636 


This condition produces y(0.0001) = 0.99999998741054 which has the absolute 
error (also relative error here) 1.258946003002137e-008. 
If we take the top two values of Table 5, 1.e., if 


a, = .92848377710353, a2 = 1.02780468112015, a3 = 1, By = —.000050, Bo = —.000055 


we obtain, at ¢ = 3.1501, y = Ly, y'| = [y yr | = 
[0 —.00005360026036837637 | which itself is sufficiently accurate though 
not as accurate as that in Eq. (8). 

However, if we attempt to interpolate considering, for tf = 3.1501, 


vw =[y»]=[ x] = [0 —00005518 ], 0 =[y. » ] = [1 92 | = [0 -.0000086: | 


a = 1.03140142306987, a2 = .15116373228873, a3 = 1, Bj = —.00005518, Bz = —.00000863 


we obtain, at t = 3.1501, yo = [y, y’] = [y1 2] = [0 —.0000100302473 ]. 
The second component of yo is far away from that of the actual yo 
although the value —0.0000100302473 remains well within the interval y’ € 
[—.5518 x 1074 - —.0863 x ie", which is still good in this case of A = 0.3. 
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Fig. 4 2 = 0.3. at ¢ = 3.1501 the initial condition used is yop = Ly, y] = [vy yo] = 
[0 —0.0000100302473]. The actual BVP solution (‘dashdot’ line) has gone outside the lower and 
upper solution bounds, which is not desired 


However, the actual BVP solution (Fig. 4) has gone outside the bounds of LUSOL, 
which is not desired. This out-of-bounds BVP solution implies that y’(0o) ~ 
y’(3.1501) = —.0000100302473 as one of the initial conditions derived numeri- 
cally from the knowledge of the LUSOL (Matlab program) is unacceptable. We will 
discover that such successive interpolation may not always succeed in maintaining 
the lower-upper solution bounds particularly for higher values of 0 < A < 1 [29]. 


(ii) A = 0.99,At = dt = 0.001,IC : yo(oo) ® yo(3.1501) = 
[0 y’(3.1501) value] As in the foregoing example, we issue the command 


>-clear all; lusolgas (0.99, 0.001, y0) 


substituting arbitrarily, for yo, a vector [0 —0.000005] just to get the interval, through 
the Matlab program, in which the value of y’(3.1501) lies. The value —0.000005 
for the foregoing interval has no role to play. This is because we have included 
in one Matlab program lusolgas all three solutions viz., lower, upper, and ODE 
solutions. However, in subsequent run of this program, the value —0.000005 needs 
to be changed by choosing a value from the interval so that through the inspection 
of the solution at y(0,) * y(0.0001) which should be 1. 

We thus get the output of the Matlab program, viz., the interval y’(3.1501) € 
[—0.0000551842736 0.0000031353000] and then generate Table 6 [24]. 

In Table 6, we see from the last but two rows that the values of y(0,) * y(0.0001) 
are a; = 0.81884983879549, a2 = 0.94820294310335 corresponding to the values 
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Table 6 4 = 0.99, y/(3.1501) € [—.551842736 x 10-4, —.031353 x 10~*] Experiment taking 
different values of y’ (3.1501) within the foregoing interval to determine actual value of y’ (3.1501) 


IC: yo = [y y’] = [y yo] att = 3.1501 Corresponding [y y] at t = 0.0001 
ls ~0.0000050 | [.08678818619802 —0.09872914228022] 


0 +0.000002 | [—0.03304561541990 0.03718214222101] 


[ 0 —0.000002 | [0.0333924206491 1 —0.03779783102275] 
[o 0.000009 | [0.16073993761272 —0.18417750296482] 
[0 0.00009 | [2.3019 + i.5775 —2.6126— i1.5022] Complex 
[o ~0.00005 | [1.0933 — i.0017 —1.5050 — i.0394] Complex 
[o —0.0000005 ] [0.00756293042976 —0.00854068052541 ] 
[o 0.00001 | [0.17955 14740002 —0.20612253201888] 
[0 ~0.00004 | [0.81884983879549 —1.02798074756373] 
[0 —0.000045 ] [0.948202943 10335 —1.22987179675258] 
[ 0 —0.000047 | by interpolation [1.00365078118975 —1.32952553312948] 


of y'(oo) & y’(3.1501) which are 8; = —0.00004, 8. = —0.000045. a3 = 1. These 
values are now sufficiently good for a linear interpolation. 

From Formula (7) we get y’ (3.1501 )tetter = —0.00004700215747329007. The 
corresponding value of y(0.0001) = 1.00365078118975 (taking the first six decimal 
places of y’(3.1501)better), Which is close to 1 and sufficiently accurate for practical 
implementation. From Table 6, it is clear that the way the values fluctuate, neither 
the interpolation considering the end points of the foregoing interval y’(3.1501) € 
[—0.551842736 x 1074, —0.031353 x 10-4] nor bisection would work since the 
unacceptable complex values (Table 6) could appear. The graph (Fig. 5) depicts 
lower, upper, and actual solutions of BVP (4). 


4.3 Critical Observations for Sensitive BVPs 


Why only a program for IVP and not for BVP In the gas flow problem under consid- 
eration, we have the mathematical model which is a two-point BVP associated with 
a nonlinear second order ODE. The model is usually solved iteratively (not math- 
ematically directly) due to nonlinearity. The programming effort for this BVP, the 
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Fig. 5 4 = 0.99. At t = 3.1501 the initial condition used is yo = [y, y’] = [1 y2] = 
[0 —0.000047]. The dashdot line representing the solution of the BVP (5) is enclosed between 
the ‘dot’ line representing the lower solution and the ‘dash’ line representing the upper solution (as 
it should be) 


physical length of the (Matlab) code (program), and the concerned debugging time 
are significantly much more than that for the corresponding IVP. 

Because of the sensitivity of the problem, that increases significantly as i 
approaches 1, there is a good chance of missing the actual solution of the BVP 
besides obtaining unacceptable complex values in the actual ODE (BVP) solution 
y(t). This is due to numerical errors (instability) resulting from square-rooting oper- 
ation as well as from subtracting nearly equal numbers [5] inherent in this problem. 
It can be seen from the graph of the solution that the line representing the solu- 
tion sharply (not exactly vertically though) descends implying a slight change in the 
independent variable f there is a large change in the dependent variable y. 

Anybody would like to write a complete comprehensive program for the BVP 
and produce the end result (solution) by running the program only once. Due to the 
foregoing two major problems, however, we prefer to write a simple concise easily 
comprehensible program for [VP (ODE). This presented concise program, though 
needs to be run a few times (more than once), has a number of advantages. 

It (i) needs very little programming effort, (ii) requires very little or no debugging, 
(iii) has no chance of missing a solution, and (iv) gets us the trial solution noniter- 
atively for the unknown initial condition, viz., the value of y’(oo) ® y’(3.1501) 
chosen arbitrarily but as appropriately as possible by using human judgment (natural 
intelligence) based on the knowledge of the physical problem and the concerned 
mathematical model. 
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Further, by (human) inspection of one or two trial solutions, we easily get to decide 
on a significantly better initial condition and then produce the next superior solution. 
Thus, repeating this process a couple of times, we get two fairly close values for the 
unknown initial condition and the corresponding solutions. Only one application of 
the linear interpolation (7) at this point provides us sufficiently accurate solution. 
This solution serves our purpose for all real world problems. 

Combination of natural and artificial intelligence versus artificial intelligence 
The concept of combining the natural (human) intelligence and artificial (computer- 
produced) intelligence in a computer program is not often very much stressed. There 
has been implicitly an effort to rely on only artificial intelligence (intelligence simu- 
lated by a computer) and consequently write a time-consuming lengthy computer 
program for obtaining the desired result in one go (run). 

If we weigh both pros and cons, we will discover that there is certainly a scope 
for the combination. The considered gas problem is probably an example for such a 
scope. While we, the humans, want computer to do everything for us, it is sometimes 
more desirable to allow a combination of human intelligence with the computer 
program in a real time mode along with more than one run of the program. 

The limitations (such as the situation created during the execution of the program 
and unforeseen by the programmer) of the artificial intelligence is countered by 
the natural intelligence. The artificial intelligence (computer), on the other hand, 
is mistake-proof and often (in computation) fast unlike the natural intelligence (of 
common humans). 

Why not finite difference scheme For the concerned nonlinear BVP (5), it is 
possible to set up a finite difference scheme which will be a system of nonlinear 
algebraic equations. One needs to solve this system iteratively. Both from computer 
storage and from programming points of view taking into account the sensitivity, 
such a proposition is not attractive. 

Speed at which y(t), y'(t) > 0 as t — oo is vitally important Mathematically 
y(co) = 0, y’(co) = O These two conditions imply that the BVP (5) is already an 
IVP and we should be able to integrate backward assuming a reasonable numerical 
value (for a real world problem) for oo. We will, however, soon discover that the step 
by step integration will not be able to come out of the zero values for the solution, 
which are completely wrong. 

From numerical computation point of view, although both y(oo), y’(co) = 0, 
these tend to zero at different speed. We always retain y(oo) = 0 while y’(oo) will 
be chosen as an appropriate negative nonzero value by some numerical experiment 
such as the one shown in Sect. 3. The correct choice will produce y(0) = 1. In the 
gas problem, oo is taken as 3.1501 and 0 is taken as 0.0001. For the concerned real 
world problems, these values are good enough. 

The importance of mathematically derived lower and upper solutions We have 
discussed the need for the knowledge of lower and upper solutions for this particular 
mathematical model corresponding to the gas flow problem in Sect. 4.1. It is not 
necessary, in general, to know these solutions while solving the BVP associated with 
a second order ODE. However, for a sensitive model, we are likely to miss a solution 
once we deviate away from the lower and upper solutions. 
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Consequently, we may be far away from the region of interest and get lost in 
the vast space. In fact, once the program starts drifting away from the solution, it 
has practically no chance to return to the solution in this particular model unless 
the program has the provision to discover that it is truly proceeding in the wrong 
direction. To take care of such a provision and bring the solution procedure back 
onto the proper track involves significant programming effort. Further, we will still 
not be sure that there could be unforeseen situation. The lower and upper solutions 
do keep the ODE solution well on track and help us not to miss an ODE solution 
which is otherwise not unlikely. 

Is there a scope of interpolation Resorting to interpolation at the very beginning, 
i.e., using the initial lower and upper values of y’(3.1501) may lead to a value 
of y’(3.1501) that could produce a solution y(t) very much outside the LUSOL. 
Returning from this value to one that would provide the solution y(t) within LUSOL 
and that would satisfy the BCs may not happen. One may drift away from the true 
solution y(t) without any chance of return. 

To rectify this situation, one might keep a test in his program to detect such a situ- 
ation and then take corrective measures probably using some other procedure. This 
would involve much more programming effort and could still not be bug-free. Hence 
interactive approach proposed here is more desirable. However, in this interactive 
approach, when two values of y’(00) = y2(oo) & y’(3.1501) = y2(3.1501) are suffi- 
ciently close as in Tables 4 and 5, we resort to only one linear interpolation. This would 
produce sufficiently accurate value of y’(oo) = y2(co) & y’(3.1501) = y2(3.1501) 
for all real world applications. 


5 Conclusions 


Numerical measurements—how these should be obtained Associated with every 
measuring device, there is a fixed order of error or, reciprocally, a fixed order of 
accuracy. Whenever we use the resulting measurements as input, we know (or should 
know) the order of error which is constant. For example, when we measure the 
diameter of a sphere using, say, a slide caliper or a screw gauge, we take several, say 
30, measurements of the diameter, maybe, in different positions. Each measurement 
has an error. Thus all the errors, known to be random and follow a normal distribution 
statistically, have the same order. We then take the average of all these measurements. 
This average is the required (maximum likelihood) diameter. The error associated 
with the average has a fixed order. This fixed order is primarily the same as the order 
of error in each of the measurements. In other words, this fixed order is the same as 
the order of error associated with the measuring device. 

If the errors have different order then the resulting accuracy will be that due to the 
highest order of error in one of the measurements, viz., the worst one. In our real world 
situation, fortunately, the order of error remains the same in all the measurements 
when only one device is used for all the measurements. Consequently, the average 
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will also have the same order of accuracy, but will be the most likely or statistically 
best measurement associated with the device as per normal distribution theory. 

Consider, as an example, a time measuring device, viz., an electronic timer that 
can measure time up to, say 10 min. Assuming that the timer is an extremely high 
precision device capable of measuring time with an accuracy of 0.005%. If this timer 
is used to measure time for a 100 m sprint, then it can measure time with an error which 
is not less than 10 x 60 x 0.00005 = 0.03 seconds. The sprint time will be usually in 
the interval [9, 12s]. If a sprinter’s time has been noted by this device as 9.69 s., then 
the exact time will lie in the interval [9.66, 9.72 s] with a high degree of probability. 
We will never know the exact time (a real quantity) under any circumstances at any 
time in future. But we will definitely (with 100% probability) know the interval in 
which the exact time lies if this interval is made sufficiently large, say [8.5, 11.00 s]. 
If the electronic timer is capable of measuring time up to 1 min and as before it is an 
extremely high precision device with an order of accuracy 0.005% [5], then it can 
measure time with an error which is not less than | x 60 x 0.00005 = 0.003 s. If a 
sprinter’s time has been noted by this device as 9.69 s, then the exact time will lie in 
the interval [9.687, 9.693 s] with a high degree of probability. 

In the case of measuring the sprinting time, we do not have the scope to measure 
the time more than once. Unlike the case of measuring the diameter of a sphere 
a number of times, we depend on only one reading of sprint time and determine 
the quality of measurement of time. So no normal distribution theory arises here in 
this kind of measurement. One may, however, use more than one, say 30, electronic 
timers each having the same order of accuracy or, complementarily, same order of 
error. In the later case, the normal distribution theory can be used, i.e., the average of 
the 30 times recorded may be taken for each sprinter and this average time may be 
associated with each sprinter. There is yet another way which is possibly followed 
to rank a sprinter in all major sprinting events. This is related to visualization of the 
finish line by the referees/judges. The sprinting video tells the judges which sprinter 
crossed the finish line first and which the second. All these are, however, never totally 
error-free. It is not at all impossible that two or more sprinters touch or cross the finish 
line simultaneously or in a time interval which is much smaller, say 0.0001 s, the 
timers capable of measuring time up to | min with an accuracy of 0.003 s will simply 
not be able to detect this interval. Visualization by human judges or robots also is 
not error-free. 

Confidence level in statistics and in deterministic numerical computation As in 
the foregoing case, we will never know the exact time under any circumstances at 
any time in future. But we will definitely know the interval in which the exact time 
lies if this interval is made sufficiently large, say [9.5, 10.00 s]. However, the broader 
the interval is, the more confidence, say 99% confidence we have on the existence 
of the exact quantity in the interval. If the interval is sufficiently broad then we are 
100% confident about the existence of the exact quantity in this interval. But such a 
broad interval in which the exact quantity surely exists is often not much useful in 
statistics. The narrowest possible interval in which the probability of the existence 
of the exact quantity is reasonably high, say 95%, is most desirable in statistics. 
It may be noted that, in statistics, we do not talk about 100% confidence level or, 
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equivalently 0% significance level [37, 38]. The confidence level is always less than 
100% in all statistical computations for any real world problem. 

On the contrary, whatever numerical computations we do in a deterministic envi- 
ronment, we necessarily compute the error bounds along with the solution/result to 
scientifically justify the quality of the solution [5]. The error bounds provide us the 
interval in which the exact solution/result lies. Since, here in deterministic numerical 
computation, we are 100% confident about the existence of the exact solution in the 
interval, we do not explicitly bring into picture the confidence level at all. 

Optimization: Importance of randomized/heuristic algorithms In linear and 
nonlinear programming problems when the regions containing the optimal solutions 
have been made small, one may employ heuristic or/and evolutionary approaches, 
(besides, however, iterative deterministic approaches) to track down the optimal solu- 
tion accurately for ill-conditioned problems; one may alternatively employ error-free 
methods to obtain exact optimal solution for bad optimization problems [39-43]. The 
main motive is always to obtain a quality solution for any problem, ill-behaved or 
not, with uniform accuracy. 
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Energy, Carbon and Water Efficiency ce 
of the Most Sustainable Corporations 

in the World 


Alina Barbulescu and Simona Mascu 


Abstract The present study addresses the amount of economic output possible at 
a given level of energy supply, carbon (greenhouse gas emissions), water use and 
waste generated recorded in 2014 by the 100 most sustainable corporations in the 
world. We use an integrated approach that assess how green businesses operate and 
we analysed the relationship between energy, carbon and water based on the data 
from the top 100 sustainable companies in the world. We propose two models for 
the dependence of carbon on energy efficiency and carbon on energy and water 
efficiency. The results prove the existence of high correlations between the carbon 
efficiency, water efficiency and energy efficiency. Comparing the models, it results 
that the water has a smaller influence than the energy on the carbon efficiency. 


Keywords Model, carbon productivity - Energy productivity - Water 
productivity - Sustainable business 


1 Introduction 


Sustainable businesses should be committed to environmentally compatible protec- 
tion by defining measurable and controllable goals based on the costs when setting 
corporate objectives. Recent green marketing studies [8] provide a list of various 
methods used by retailers to embed sustainability into their activity as product 
design innovations, responsible sourcing, recycling practices, building green product 
credibility and consumer engagement practices. 

Bocken et al. [4] found that the majority of sustainable business models can serve 
as a vehicle to coordinate technological and social innovations with the system-level 
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sustainability to organize the business activity in a greener way. Green companies 
create new businesses that help the global environment and contribute to a sustainable 
development of human society by providing products and services through the devel- 
opment of innovative technologies. Researchers [11] suggest that a real green busi- 
ness needs efficient market research methods, greener and credible communication 
for taking advantage of existing trends and let consumers know official certification 
for green products. 

Sustainable business activities of green marketing oriented companies reflect their 
policies and practices towards the natural environment. They focus on reducing 
environmental impact of their activities using metrics regarding their environmen- 
tally products, services and practices, such as their energy, carbon, waste, water 
inputs/outputs. Further research [1, 12] led to significant developments in under- 
standing how to accomplish the marketing of sustainable business and green products 
in an efficient and profitable manner. 

Environment, economy and social justice, which are the three pillars of sustain- 
ability, have increasingly become parts of marketing decision-making [10]. Sustain- 
able business market provides clear goals and objectives that give a strategic umbrella 
for enterprises development and internal initiatives. When companies are “going 
green” by developing a sustainable business model, they will not only do good to 
society, but in fact develop a competence and capability of how to be market-driving. 
This will ensure a lasting competitive advantage [2, 13]. 

Figure 1 shows the elements of green marketing mix and gives a broad under- 
standing of green marketing policies, practices and goals. Henri and Journeault 
[9] found that traditional polluting industries have been inclined to disclose their 
environmental information under intense pressure from their stakeholders. Here are 
taken into consideration manufacturing industry such as the chemical and petroleum 
industries. 


-Compliance with national and 
international environmental 


»Green Market” 
Global Environmental 


Sustainable bussines market 


legislation,vinternational treatie 

and agreements and anticipate 

future trends; 

2. Conserve natural resources by 

using energy and water wisely 

and seek further opportunities 

toimprove the resource 

efficiency of stores and 

supply chain; 

3. Provide customers with 

products that make sustainable 

use of natural resources; 

4. Minimize waste by recycling 

and using materials and produc 

with recycled content; 

5. Reducing greenhouse gas 

emissions from stores and 
distribution centres. 


Fig. 1 Sustainable business market. Goals, practices, policies 
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Reduce waste and recycle 
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facilities 
Reduce fuel and 
greenhouse gases from 
transportation. 
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Timothy and Yazdanifard [17] suggest that green marketing procedures specifi- 
cally are excessive and they oblige long haul arranging. Green businesses work to 
develop sustainable products that offer real economies for consumers and inspire 
customers on how they can make changes to their everyday lives [14]. 

On February 2015, European Commission [7] published The 2015 Energy Produc- 
tivity and Economic Prosperity Index which aims to ensure security of supply for 
Europe, create a deeper integration of EU national energy markets, reduce energy 
demand, and cut carbon emissions. 

The Global Sustainability Research Alliance has created a list of twelve quantita- 
tive key performance indicators, including energy productivity, carbon productivity, 
water productivity and waste productivity, that reflect the impact of the company’s 
products and services on our planet and communities heavily. The 2015 Global 100 
includes corporations from 22 countries. Among the top 10 countries, the United 
States had the highest number of companies with 20 (7 more than in 2011). It 
was followed by France and Canada (12), the UK (11), Germany and Finland (5), 
Australia, Sweden, South Korea and Singapore (4), Denmark, Norway, Netherland 
and Ireland (2), Japan, China, Switzerland, Brazil, Belgium, Portugal and Spain (1). 

As Won-Kys [21] states, industries which introduce new technologies for 
increased value-added or green technologies which reduce greenhouse gas emis- 
sions should receive support. Sweden, Ireland and Denmark, among others, have had 
longstanding negotiated agreements with industrial companies to provide technical 
support and financial incentives in return for energy savings [15, 18], 

Water productivity, which measures the revenue by water use, has been an 
afterthought in conventional business planning. But, water scarcity, especially in 
heavy industries (e.g. mining) has increased the importance of this environmental 
indicator. Biloslavo and Trnavcevic [3] identified that waste which is less financially 
relevant than energy, carbon or water, is an increasingly important environmental 
indicator in its own right reduction of pollution in production processes. 

As leading retailers, the 100 sustainable companies in the world have a responsi- 
bility to take action to tackle climate change and work towards a more sustainable 
future. Velchev et al. [20] showed that companies achieve smaller energy consump- 
tion during the machining processes by developing a model of direct energy consump- 
tion which depends on the cutting parameters (cutting speed, feed) and by using 
an improved model of specific energy consumption. According to Steinberger and 
Krausmann [16], resource productivity is an appealing and widespread environmental 
sustainability indicator which is defined as output per resource input ratio. There- 
fore, direct emphasis on resource inputs and emissions is ultimately the only way to 
ensure effective policy. 

In the following we shall present two models for the relationship between the 
energy productivity, carbon productivity, and water productivity for the first 100 most 
sustainable companies in the word. Data are taken from the report [6]. Definitions 
of the indicators considered in the modeling are presented in Fig. 2. 
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® How much revenue companies can squeeze out of every unit of, 
energy they use,and shows which companies are best able to 
adapt to our changing energy future. 
MT A Energy productivity 


® This metricdivides a company’s total revenue by total GHG 
emissions, and gives us a sense of how companies are exposed 
to the new GHG regulatory environment. 

Equation: Revenue (SUS) /GHG 


to water scarcity challenges. 
Equation: Revenue (SUS) / 


Water productivity 


Waste productivity 


Fig. 2. Definitions of energy productivity, carbon productivity, water productivity, waste produc- 
tivity 


2 Methodology 


Firstly, Spearman’s correlation coefficient [19] was used to analyse the correlation 
between two variables X and Y. 

Secondly, we propose two models, for the dependence of carbon productivity on 
energy productivity and carbon productivity on energy and water productivity. They 
are generalized linear models (GLM) [22] because they reduce to linear models by 
taking logarithms. 

We denote respectively by (Z,), (U,) and (W,), the variables carbon productivity, 
energy productivity and water productivity. 

Model 1. The first model gives the relationship between the carbon productivity 
and the energy productivity. We propose the equation: 


Z; = Bo * (U,)” * &,, (1) 


where: €; is the random variable and fo, 6; are the parameters that must be estimated. 
If the values of the estimators bo, b; of the parameters Bo, 6, were determined, 


based on the samples (z;), (uj), i = 1,2, ...,, the model (1) can be written as: 
z; = bo x (u;)"! *e;,i = 1,2,...,N (2) 
where e;,i = 1,2,...,n are the residual values. 


Taking logarithms, Eq. (2) becomes: 


In(z;) = In(bo) + byln(u;) + In(e;),i = 1, 2,...,n, (3) 
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Denoting by: 
z, = In(z;), u; = In(u;), Bo = In(bo), Bi = by, e; = In(e;),i = 1,2,...,n, (4) 
equation (3) becomes: 
z, = Bo + Biwi, +¢,i=1,2,...,n. (5) 


The following hypotheses have been checked for the standardized errors: 
normality, homoskedasticity and the absence of correlation. For this purpose, the 
Kolmogorov-Smirnov test (for normality) and the Levene test (for homoskedas- 
ticity) were performed. The autocorrelation function was also built. The tests were 
performed at the significance level of 0.01 and the confidence interval at the 
99% confidence level of has been built, for comparison with the values of the 
autocorrelation function [5]. 

Model 2. The second model extends the first one, taking into account the 
correlation (0.766) between the carbon and water productivity (Fig. 3). 

For the second model we propose the equation: 


Z; = ay * (U;)™' * ae E,, (6) 


where &, is the random variable and ap, a ,a@2 are the parameters that must be 
estimated. 

Ifthe values of the estimators do, a), a2 of the parameters a, a, @2 are determined 
based on the samples (z;), (uj), (Wi),i = 1,2,...,n of the registered values of 
the carbon productivity, energy productivity and water productivity, then Eq. (6) 
becomes: 


Zi = Ag * (uj) * exp(agw;) *Vj,i =1,2,...,n (7) 
Energy Carbon Water Waste Tax 
Rank ae wt a es . 
productivity | productivity | productivity | productivity | paid 
Rank 1 0.028 -0.008 -0.029 -0.068 | -0.008 
Energy 9.029 1 0.932 0.768 0.610 0.114 
productivity 
Carbon 
= -0.008 0.932 1 0.766 0.631 0.142 
productivity 
Water 
a -0.029 0.768 0.766 1 0.701 | -0.016 
productivity 
Waste 
a -0.068 0.610 0.631 0.701 1] -0.128 
productivity 
Tax paid -0.008 0.114 0.142 -0.016 -0.128 1 


Fig. 3. Spearman correlation coefficients 
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where v;,i = 1,2,...,n are the residual values. 
Taking logarithms, Eq. (7) will be transformed in: 


In(z;) = In(ao) + ayln(u;) + aow; + In(v;), i = 1,2,...,7n. (8) 


Denoting by: 


z, = In(z,), 4; = In(uj), Ao = In(ap), At = a1, Az = a9, v, = Inj), i = 1,2,...,0, 


(9) 


equation (8) becomes: 
zZ, = Ao t+ Aru) + Aow; + vj,i = 1,2,..., 0. (10) 


The same statistical tests were performed for checking the model quality. The R 
software was used for statistical analysis and modeling. 


3 Results 


Model I. The values estimated for the parameters in Eqs. (4) and (5) are: 
Bo = In(bo) = 2.9445bp9 = 19.002; By = b, = 0.9282. 


The values of the t-statistics and the corresponding p-values, are respectively: for 
bo: to = 9.805, p-val = 5.28 x 10~!° and for 1: t; = 26.085, p-val = 1.47 x 10-*4, 

Since the p-values are less than 0.01, it results that the parameters are significant. 

The value of the F-statistics is 680.428 and the corresponding p-value is 1.47 x 
10-** 0.01, so the model is entirely significant. 

After the residual analysis, the normality and homoskedasticity hypotheses 
couldn’t be rejected because the p-values associated with the Kolmogorov—Smirnov 
and Levene tests are respectively 0.029 and 0.924, so greater than 0.01. 

Analysing the correlogram it results that the hypothesis of residuals’ autocorre- 
lation can be rejected. 

The value of the determination coefficient, R? is 0.885, so there is a strong linear 
correlation between the variables in the model. 

Model 2. The values estimated for the parameters in Eqs. (9) and (10) are: 


Ao = In(ap) = 3.084ap = 11.849; Ay = a; = 0.9092; Ap = ay = 2.06 x 107° 


The results of the t-test on each coefficient in the model (10) are given in Fig. 4. 
Since the p-values are less than 0.01, the coefficients are significant. 
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Fig. 4 Results of t-tests on Coefficients t Stat p-value 
the coefficients of the model TCS”? 8S .!? o.oo 
(10) Ao 3.084 8.328 1.06 x10 
A\ 0.909 19.974 2.98 x 1074 
Az 2.06 * 10° 2.280 0.02504 


The F test confirms that the model is entirely significant, since the value of the 
test statistics is 337.261, and the corresponding p-value is 1.04 x 10~*! < 0.01. 

The hypotheses of normality and homoskedasticity couldn’t be rejected because 
the p-values associated with the tests are respectively 0.015 and 0.801, so greater 
than 0.01. Concluding, the model is correct from statistics viewpoint. 


4 Conclusion 


Most managerially oriented green marketing researches are insufficient to deliver 
necessary changes to achieve long-term social and environmental sustainability. This 
analysis demonstrates that sustainable business models consider a wide range of 
stakeholder interests that makes green marketing a complex, practice based, socio- 
material, performativity endeavour. 

The productivity accounting analysis indicates that energy and productivity are 
the most significant drivers of companies’ sustainability. The substantial productivity 
improvement of the most sustainable companies in the world in 2014 is attributable 
more to high-tech industrial sectors. Heavy industry, characterized by high energy 
emission levels, lags behind in terms of productivity and overall technical change. 

The results of our models prove the existence of high correlations between the 
carbon productivity, water productivity and energy productivity. Comparing the 
models, it results that the water productivity has a smaller influence than the energy 
productivity on the carbon productivity, since the determination coefficient R” slowly 
increased from 0.885 to 0.893. 

There are also significant correlations between the waste productivity and water 
productivity, but their relationship presents high oscillations, function of the industry 
type, so an extensive study function of this criterion must be done in the future. 

Also, there are complex correlations among variables considered for the classifica- 
tion of the green companies, which occur in the strategy adoption and performances 
of green technology and that must be analyzed in future works. 
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Multi-objective Evolutionary M®) 
Algorithms: Decomposition Versus siete 
Indicator-Based Approach 


Anata-Flavia Ionescu 


Abstract In the past decades, evolutionary algorithms have become a popular 
approach to multi-objective optimization problems. This chapter compares the per- 
formance of four state-of-the-art multi-objective evolutionary algorithms represent- 
ing two seminal approaches in recent research: indicator-based and decomposition- 
based. The algorithms are compared with respect to six convergence and diversity 
performance measures, including the recently proposed IGD+, for benchmark test 
problems in two popular suites. Experimental results are discussed. 


Keywords Multi-objective optimization - Evolutionary algorithm - IBEA - 
MOEA/D 


1 Introduction 


Multi-objective optimization has many applications in engineering and computer 
science, as well as in economy and game theory. Evolutionary approaches are among 
the most widely-used in solving multi-objective optimization problems, mainly due 
to the fact that they provide an approximation of the entire set of optimal solutions in 
a single run and to their generally lower sensitivity to the geometry of the solution set 
(see Coello [3] for a review). The evolutionary approach in solving MOPs often relies 
on genetic operators (most heavily on the selection operator) to ensure convergence 
and diversity at the same time. 

Two dominant directions of evolution were identified by Elarbi et al. [8] as com- 
posing the latest wave in MOEA development: decomposition-based and indicator- 
based MOEAs. The main idea of the indicator-based approach was first presented in 
Zitzler and Ktinzli [19] and consists of defining variation operators which maximise 
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a given quality indicator. The second noteworthy approach that has grown in popu- 
larity over the past decade, proposed by Zhang and Li [16], consists of decomposing 
the MOP in a finite number of scalarized subproblems. 

The chapter is structured as follows: in the Theory section, we present four state- 
of-the-art evolutionary algorithms; in the Results section we compare the algorithms 
with respect to six quality indicators for five bi-objective test instances in the ZDT 
suite and five tri-objective problems in the DTLZ suite; conclusions and perspectives 
for future research are discussed in the final section. 


2 Theory 


In this section, we provide the theoretical framework for the chapter and briefly 
describe the orginal MOEA/D algorithm, as well as two of its more recent variants 
which we used in our experimental study, and SMS-EMOA. 


2.1 Background 


A general Multi-objective Optimization Problem (MOP) consists of minimizing a 
vector function F(x) = [fi (x), ..-, fin(x)]', of m components (the objectives), over 
a search space of decision variables, usually denoted by §2, subject to inequality and 
equality constraints. 

Let Z denote the objective space. An objective vector (solution) z! € Z is said 
to Pareto dominate another objective vector (solution) z7 € Z (z! > 2”) iff z! is 
at least as good as 2? w.rt. all objectives and better w.r.t. at least one objective. 
Weak Pareto dominance denotes Pareto dominance or equality (z! > z). Strict 
Pareto dominance occurs when z! is better than z” for all objectives (z! >> 2) A 
solution z is Pareto optimal iff it is not Pareto dominated by any other solution in 
the objective space Z. Note that this relationship is a partial ordering, becauee all 
pairs of solutions in which one solution is better than the other w.r.t. one objective 
and worse w.r.t. another objective are incomparable. The set of all Pareto optimal 
solutions for a given MOP is called Pareto front. The set of all decision variable 
vectors x that map to the Pareto front is called Pareto set [6, 20]. 

A set of objective vectors A C Z is called an approximation set if any two vectors 
in A are incomparable w.r.t. Pareto dominance. Approximation sets are usually com- 
pared using quality indicators. Generally, an n-ary quality indicator (most quality 
indicators are binary or unary) is a mapping J : £2” — R, which assigns a real value 
to an ordered set of n approximation sets [20]. The value of the function represents 
a specific performance measure for the approximation set(s). 

MOEAs have long been dominated by what was later called dominance-based 
approach, in which selection depended on the Pareto dominance relationship. NSGA- 
II [4] and its improved variants are popular exponents of this category. However, 
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recent research has been marked by other two major innovations: decomposition- 
based and indicator-based algorithms. 

Among the recent trends in MOEA development, decomposition is perhaps the 
most popular and seminal approach. The idea was first proposed by Zhang and 
Li [16], who used it in their Multi-objective Evolutionary Algorithm Based on 
Decomposition (MOEA/D), and uses sets of aggregation functions to define different 
search directions in the objective space. 

The main idea behind the indicator-based approach consists of guiding selection 
by the maximization of a particular quality indicator, reflecting convergence to the 
true Pareto front and/or diversity. This generic principle was first exposed as part of 
the Indicator-Based Evolutionary Algorithm (IBEA) by Zitzler and Kiinzli [19]. To 
be categorized as an IBEA, an algorithm must basically use, as its fitness function, 
the difference (loss) in the chosen quality indicator if an individual is to be removed 
from the population. Individuals that minimize the loss in quality are eliminated at 
each iteration. 

In the following, we describe the original MOEA/D, two of its variants we used 
in our experiments (MOEA/D-DE, MOEA/D-STM), CDG-MOEA—another algo- 
rithm pertaining to the decomposition-based approach, as well as an indicator-based 
algorithm, SMS-EMOA. 

MOEA/D consists of decomposing the original MOP in a number of single- 
objective optimization subproblems. The N subproblems are identified by N objec- 
tive weight vectors 4, i € {1,..., N}. The individuals in the population, at each 
generation, are the current best solutions found by the algorithm for each subprob- 
lem. Solutions to subproblems are obtained by optimizing a scalarizing function 
based on the assigned weight vector. Popular approaches to scalarization include 
the weighted sum approach, the Tchebycheff approach, and penalty-based boundary 
intersection. For example, a variant of the Tchebycheff scalarization which has been 
shown to improve the performance of decomposition-based MOEAs can be defined, 
for the ith subproblem, as follows: 


minimize g'@(x|A‘, 2*) = max {| fj(x) — 2*|/A4}, 
l<j<m 
subject to x € 2 


where z* is the reference point, usually the ideal point. If ri = 0, it is replaced by 
10~°. Mating restrictions limit reproduction to solutions from the same neighbor- 
hood. The pseudo-code for the algorithm is given in Algorithm 1. 

Unlike the original MOEA/D, which used simulated binary crossover (SBX), 
MOEA/D-DE [12] introduces the differential evolution (DE) crossover in an attempt 
to address the issues of loss of diversity and inferior solutions generated by SBX. 
MOEA/D-DE also introduces a parameter called 6, the probability of mating for 
the current solution to subproblem i being restricted to the neighborhood P. First, 
a number rand is randomly generated from a uniform distribution on the interval 
[0, 1]. 
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Algorithm 1 MOEA/D 


Require: n > 0 Vv x #4 0; MOP; stopping criterion; N, the number of subproblems; Al AN the 
uniformly spread weight vectors 
Ensure: external (archival) population E P 
EP <6 
Compute Euclidean distances between all pairs of weight vectors 
fori =1,...,Ndo 
Find the T closest weight vectors in terms of Euclidean distance 


B; < {ij,..., ir}, where A!',..., A!7 are the closest T weight vectors to weight vector \a 
end for 
Generate initial population of feasible solutions x!, ..., x 
FV! & F(x‘) 
ZEA GZises ey ta)? 
repeat 


fori =1,...,Ndo 
Reproduction: Generate new solution y using genetic operators from x* and x’, where 
k,l © B; randomly selected 
Repair: Apply a problem-specific repair method to solution y to obtain y’ 
Update z: 
for j = 1,...,mdo 
if 2; > f(y’) then 
a3 = fj) 
end if 
end for 
Update Neighborhood: 
for each j in B; do 
if g!(y"|A/, 2) < g(x! |A/, z) then 
xiey’ 
FV! & F(y’) 
end if 
end for 
Update E P: 
EP & EP \{x¢€ EP|x x y} 
if Ax ¢ EP, y’ < x then 
EP<=EPvUy' 
end if 
end for 
until Stopping criterion is met 


2 B;, if rand <6 
- {1,..., N}, otherwise 
where B; = {i,,..., ir}, A4,...,A7 being the closest T weight vectors to weight 


vector A‘, Then, three indices for the parent solutions are randomly selected from P. 
Let x", x’, x” be the three parent solutions. The offspring solution y is generated 
as follows: 


Multi-objective Evolutionary Algorithms... 73 


_ XP + F x (x? — x) with probability CR 
~ ae with probability 1 — CR 


where CR and F are control parameters from [0, 1]. 
As for the mutation operator, polynomial mutation is used. Polynomial mutation 
is defined as follows: 
Ve + 0% X (DE — ax) with probability p,, 
k= Ve with probability 1 — pp 


where p,, is the mutation rate, a, and b,; are the lower and upper bound for the k-th 
decision variable, respectively, and o; is defined as: 


| (2 x rand)7 — 1 if rand < 0.5 
On = 


1—(2—2 rand) otherwise 


where 7 is the mutation distribution index and rand is randomly generated from a 
uniform distribution on the interval [0, 1]. 

Stable Matching-based MOEA/D (MOEA/D-STM;; Li et al. [11]) applies match- 
ing theory to the selection operator in MOEA/D. Analogous to the stable marriage 
problem defined by matching theory, where unstable matching occurs if and only if 
there exists a man-woman pair who prefer each other over their assigned spouses, 
MOEA/D-STM seeks to assign solutions to subproblems such that no subproblem— 
solution pair would make a better match than the current one. The algorithm assumes 
complete and transitive preference orderings for both subproblems and solutions. The 
preference of subproblems for solutions are given by the scalarizing functions of the 
subproblems (whatever the approach to scalarizing), while preference orderings on 
the part of the solutions are given by the Euclidean distances to the search direc- 
tions of the subproblems (the smaller the distance, the greater the preference for the 
subproblem). 

The two conditions that need to be met in order for the matching to be considered 
stable are: 


1. if a solution x is not paired with any subproblem, then no subproblem p should 
prefer x to its currently assigned pair; 

2. if a solution x is paired with a subproblem other than p, then x prefers its pair 
to p and vice-versa. 


Constrained Decomposition MOEA with Grid (CDG-MOEA; Cai et al. [2]) was 
proposed with the aim of addressing two major issues in extant MOEAs: loss of 
diversity and sensitivity to the shape of the Pareto front. Considered a generalization 
of the €-constraint method, CDG seeks to optimize one objective while keeping 
all the other objectives between a lower and an upper bound. These bounds for 
a grid system in which each objective j is divided into K intervals of size dj = 
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a _ zi +2xo)/K. Formally, for the /-th objective, / € {1,...,m}, we have a 


single-objective optimization problem defined as follows: 


minimize fj (x) 

subject to gj(x) =k;,Vj €{1,...,m}, j Fl 
hres. KI 
xe 


where g;(x) is the grid coordinate of x with respect to objective j, given by gj(x) = 
(fj (x) — zi + o)/d;|, o being a small positive number chosen such that g;(x) € 


{1,..., K}. The MOP is thus decomposed into a number of m * K"~! subproblems. 
Mating is restricted, with a chosen probability 5, to grid neighbors within a given 
neighborhood distance T. 

A final algorithm we chose to include in our study is a specific example of IBEA: 
the S-Metric Selection-based Evolutionary Multi-objective Algorithm (SMS- 
EMOA; Beume et al. [1]). Here, the quality indicator used for selection is the 
hypervolume indicator, also called .”-metric. Hypervolume is one of the most fre- 
quently used quality indicators for evolutionary algorithms, which motivated Beume 
et al. [1] to use it as a selection strategy. The .7-metric is defined as follows: 


S (By, Veet) = A( Jy’ lyre < y’ < y}), BCR" 


yeB 


where A is the Lebesgue measure. 

Hypervolume can thus be used to express the size of the objective space dominated 
by an approximation set with respect to a given reference point. It can be demonstrated 
that, given a finite search space and a reference point, maximization of the .7-metric 
is equivalent to determining the Pareto-optimal set. 

SMS-EMOA is a steady-state algorithm. Following a (uz + 1) selection scheme, at 
each iteration, only one new individual (offspring) is generated and one individual is 
discarded from the population. Specifically, the generated offspring may replace the 
individual from the worst-ranked Pareto front obtained through the non-dominated 
sorting algorithm which contributes the least hypervolume to that front. In order 
to avoid loss of diversity, SMS-EMOA does not store an archival non-dominated 
population. The pseudo-code for SMS-EMOA is given in Algorithm 2. 

The reduction procedure (Algorithm 3) uses the non-dominated sort algorithm to 
partition the population in v successive Pareto fronts Z, ..., Z,. The ith Pareto front 
contains solutions that are non-dominated provided that fronts j, 7 <i have been 
eliminated from the population. The algorithm then discards the individual of the 
worst-ranked Pareto front through the elimination of which the loss of hypervolume 
is minimal. The loss of hypervolume for the elimination of an individual s from front 
i is given by: 


A v(s, Bi) = S(4i, Yret) — SP (Bi \ {8}, Yret) 
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Algorithm 2 SMS-EMOA 
Require: ,4 > 0; MOP; stopping criterion; reference point y;e¢ 
Ensure: Pareto optimal set from generated population 
Generate initial population of feasible solutions x!,..., x” 
repeat 
Reproduction: Generate new solution gj+1 using genetic operators from P; 
Pi41 = Reduce(P; U {gi+1}) 
until Stopping criterion is met 


Algorithm 3 Reduce(Q) 

Require: Q 

Ensure: Q 
{B1,...,¢ %,,} <= non-dominated sorting(Q) 
r =argmin,eg ,Ay (8, £y) 
return Q \ {r} 


3 Results 


We compared the performances of the four algorithms for a selection of test instances 
from two popular multi-objective benchmark suites, with respect to five widely used 
performance metrics and a recently proposed modified version of IGD, the IGD+, 
which was shown to address some limitations of the original IGD. 

For the implementation of the metaheuristics, we used the Java-based jMetal 
multi-objective optimization framework [13]. Algorithms were run on an Acer 
machine with 2.6 GHz Intel Core i5 processor and 4 GB 1600 MHz DDR3 L Memory, 
under the Windows 10 Pro operating system. 


3.1 Test Instances 


We selected the five continuous test problems in the ZDT [18] suite, namely ZDT1- 
ZDT4 and ZDT6, and DTLZ1-4 and DTLZ6 from the DTLZ suite [5]. ZDT is a 
widely used bi-objective test suite. The DTLZ test problems are scalable to any 
number of objectives, and we chose their tri-objective variants. We did so for the 
following two reasons: first, many real-world scenarios, such as partner selection 
[14] are or may be reduced to three-objective problems, and second, four or more 
objectives fall under the category of many-objective optimization problems, which 
are outside the scope of this paper. 
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3.2. Performance Indicators 


The quality indicators we used in our study are listed below. 

Unary additive ¢-indicator. The unary additive €-indicator [20] expresses the 
minimum quantity € by which a solution from an approximation set A must be 
translated in order to “cover” the true Pareto front P F;,,,-—that is, for each element 
in P F;;ye to be dominated by at least one element in A: 


Te4(A) = inf {Vz! € PFirue, 42? EA: z ret z) 


Generational Distance (GD) expresses the average distance between each solu- 
tion in the approximation set A and the nearest point in the true Pareto front P Fi,-4¢ 
and is given by: 

IA, 
min d (qj a p 
i=] iP Firue 


GD(A) = a 


where d(x, y) is the Euclidean distance between points x and y. 

Inverted Generational Distance (IGD; sometimes called distance from repre- 
sentatives in the Pareto front, or D-metric) is the average distance between each 
member of P F;,,.,2 and the nearest solution in A: 


|P Firue| 
> mind(z;, a;) 
i=] EA 
IGD(A) = 
|P Firue| 


IGD* [10] is a modified version of the original IGD, weakly Pareto-compliant 
and less sensitive to the specification of reference points. The Euclidean distance 
used in IGD is replaced by the following metric: 


m 


d(x,y) = | }“(max{y, — xx, 0})? 


k=1 


Hypervolume (HV) is defined as in Beume et al. [1]. HV is reflective of both 
convergence and diversity. 

Spread (A) is a measure of solution set diversity. It can be calculated as the 
deviation of minimum distances from the Pareto front to the approximation set A 
[17]: 


Multi-objective Evolutionary Algorithms... 77 


Vide, A)+ YO ldz, A)—d| 
i=1 ZEP Firue 
>* d(e;, A) + |P Firne| * d 


i=] 


A= 


where {e),...,@m} are the extreme solutions in P Fj;y-, d(z, A) is the distance 
between point z in P F;,,- and the nearest point in A, and d is the average of these 
distances. A is a measure of diversity and the ideal value of 0 for this indicator would 
mean that the approximation set is equally spaced. 


3.3 Parameter Settings 


For differential evolution crossover, CR = 1 and F = 0.5 were the values used for 
the control parameters. For all polynomial mutation operators, we set the mutation 
probability to p,, = 1/n, where n is the number of decision variables, and the distri- 
bution index jz, = 20. All algorithms were run 30 times, and the stopping criterion 
was a maximum of 25000 evaluations. 

The population size was set to N = 100 for all algorithms. For MOEA/D-DE 
and MOEA/D-STM, parameter T was set to 20, the upper bound for the number of 
solutions replaced by a child solution, n,, was set to 2, and parameter 6 to 0.9, as 
in their references. For CDG-MOEA, parameter K (number of grid divisions) was 
set to 180 for the bi-objective test problems and to 25 for all tri-objective instances. 
Parameter T (the maximum grid distance for neighborhood) was set to 1, and 6 
(probability of mating restriction to grid neighbors) to 0.9. 


3.4 Experimental Results 


Comparisons between the algorithms in terms of the six performance indicators on 
the ZDT test instances are presented in Tables 1, 2, 3, 4, 5, and 6. For each test 


Table 1 ZDT suite. Mean and standard deviation of the € indicator 


MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
ZDT1 | 2.59e — 027.9¢-93* | 5.58e — Oly o¢e-o1* | 3.75e¢ — O14.2e-01* | 5.70e — 032.2¢-04 
ZDT2 | 4.57e — 021.7e-02* | 9.63e — O1j.gce_01* | 9.79e — 013 4e-01* | 3.05e — 026.9¢-02 
ZDT3 | 1.03e — 013.5¢-02* | 4.67e —Oly4ce_o1* | 4.33e —Oljoc—o1* | 1.49e — 024.4¢—02 
ZDT4 | 5.70e — O13.4e—o1* =| 5.77e — Olise—o1* =| 2.75e€ + Olj.ge+01* | 8.26e — 027.0¢—02 
ZDT6 | 6.55e — 031.7¢-03 1.40e — 022 26-02 5.18e — 026. 9e-02* | 8.85e — 035.8--04* 


78 


A.-F. Ionescu 


Table 2 ZDT suite. Mean and standard deviation of GD 

MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
ZDT1 | 1.62e — 035.4e-04* | 3.31e — 041.6c-04* | 3.79e — 021.1¢-02* | 6.79e — 051.1¢—05 
ZDT2 | 1.44e — 034.6c—04* | 6.16e — 051.3e—04 5.85e — 022.2¢-92* | 4.97e — 056.6c—06 
ZDT3 | 3.54e — 031.1e—03* 6.29e — 043 6e—04* 2.74e — 028 1e03* 8.18e — 052 6e—05 
ZDT4 | 5.72e — 024.8¢-02* | 1.02e — 022.2¢-02* | 1.33e + O1s5.7e+400* | 3.40e — 042.6¢—04 
ZDT6 | 6.25e — 051.1e—04 6.99e — 0512-04 1.4le— 019.3e—02* 2.5le — 043 3¢_05* 
Table 3 ZDT suite. Mean and standard deviation of IGD 

MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
ZDT1 | 5.48e — 041.7e-04* | 1.62e — 023 4e_03* | 1.08e — 023.4¢_03* | 1.35e — 048 8¢—07 
ZDT2 | 4.60e — 04;.3e-04* | 2.18e — 024,5e_03* | 1.96e — 027.9¢-03* | 3.30e — 045 5¢—04 
ZDT3 | 1.8le — 036.8e—04* 7.96e — 033. 1e—03* 8.50e — 031.9e—03* 2.62e — 046.2e—04 
ZDT4 | 1.4le — 029.9¢-03* | 1.56e — 024.4ce-03* | 8.58e — 015.7¢-01* | 1.57e — 031.6c—03 
ZDT6 | 1.44e — 049.2605 1.67e — 049,6e—05 8.28e — 041.7e-03* | 1.60e — 045.4¢—06* 
Table 4 ZDT suite. Mean and standard deviation of IGD+ 

MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
ZDT1 | 1.68e — 025.2¢-03* | 2.58e — 017.9e-02* | 3.42e — Oly.te_o1* | 2.41e — 031.7¢~05 
ZDT2 | 1.24e — 023 9¢-03* | 3.19e — Ol62e-02* | 6.08e — Olose_o1* | 3.73e — 035.9¢-03 
ZDT3 | 4.90e — 021 .3e—02* 8.86e — 024. 0e—02* 2.59e — 016.3e—02* 1.97e — 032.8¢—03 
ZDT4 | 4.29e — Olooe-o1* | 2.87e — Oli2ze—o1* =| 2.7le + Olj.geso1* | 1.50e — 021.3602 
ZDT6 | 2.7le — 038 1e—04 2.92e — 031.2e—03* 1.58e — 024. 1e—02* 4.36e — 032. 6e—04* 
Table 5 ZDT suite. Mean and standard deviation of HV 

MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
ZDT1 | 6.40e — 017.4e-03* | 4.07e — 017,8¢-02* | 2.48e — O1j.oc—01* | 6.62e — O12.8¢_05 
ZDT2 | 3.10e — Ol6.2¢-03* | 1.49e — 026.1¢-02* | 1.78e — 023.1e-02* | 3.27e — 015.7¢~03 
ZDT3 | 4.40e — 012.3e_02* 3.99e — 015 .5e—02* 2.1le — O16.2e—02* 5.16e — Olg o¢—04 
ZDT4 | 2.42e — O12 te—o1 3.72e — 011.3e-01* | 0.00e + 000.0e+00* | 6.48e — 011.3¢—02 
ZDT6 | 4.0le — 014.4¢—03 4.0le — 011.9e-03 3.85e — 013.4ce_o2* | 3.98e — 014.7¢-04* 
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Table 6 ZDT suite. Mean and standard deviation of SPREAD 
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MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
ZDT1 | 3.77e — Ol4.4e—02* | 8.45e — 017.7¢-02* | 8.65e — 017,5ce-02* | 1.16e — 011 .3¢-02 
ZDT2 | 3.28e — 017.8¢-02* | 9.69e —0115e-01* | 8.54e — Ol4.ge—o2* | 1.88e — 01g o¢—o02 
ZDT3 | 9.96e — Olo.o¢-02* | 9.67e — O11 .9¢-92* | 1.08e + 007.6c_-02* | 7.35e — 012.9¢-03 
ZDT4 | 9.89e — 011.3e-01* | 9.46e — Olo4e—o1* | 1.07e + 00).42--01* | 3.03e — 011 o¢—01 
ZDT6 | 1.55e — 011 .0e—02 1.59e — 012.1¢-02 1.0le + 002,7e-01* | 1.66e — 01).4¢—-092* 


problem, written in bold on gray background are the best values of the indicators, 
and a light gray background is used to highlight the second-best values. 

Wilcoxon’s rank sum tests with an @ level set to 0.05 were conducted to make 
pairwise comparisons between the algorithms in terms of all indicator values. Aster- 
isks mark the indicator values that were found to be significantly worse than the one 
obtained by the best-performing algorithm for the problem on the row. 

It can be seen in the tables and in Fig. 1 that SMS-EMOA performed best for all 
instances except ZDT6, where the best mean value of all indicators was obtained 
by MOEA/D-DE. ZDT6 is characterized by a concave, non-uniform Pareto front 
distribution. The smallest values of the standard deviation (reflective of algorithm 
stability) were obtained by SMS-EMOA for most instances, even ZDT6. MOEA/D- 
DE outperformed SMS-EMOA in terms of standard deviation for ZDT6 only in terms 
of spread. Overall, the poorest results for the entire ZDT suite were the ones obtained 
by CDG-MOEA. 

Tables 7, 8,9, 10, 11, and 12 show the performances of the compared algorithms on 
the DTLZ test suite. Figures 2 and 3 plot the Pareto fronts of the test instances versus 
the approximations provided by the algorithms that obtained the best performance 
for the greatest number of quality indicators. 

SMS-EMOA performed best in terms of both mean and standard deviation values 
of all indicators for DTLZ1-2 (except IGD for DTLZ2). For DTLZ3—a problem 
characterized by many local fronts parallel to the global front, it outperformed the 
other algorithms in terms of convergence indicators, but it was outperformed in terms 
of HV and spread by MOEA/D-DE and CDG-MOEA, respectively. Also, as can be 
observed in Fig. 2, the approximation it provided for DTLZ3’s Pareto front—though 
the best compared to the other algorithms in this study—was fairly poor. It was 
outperformed in both aspects by other algorithms for problems DTLZ4 and DTLZ6. 

For problems DTLZ4 and DTLZ6, the best performances were achieved by CDG- 
MOEA and MOEA/D-STM, respectively. DTLZ4 is a non-uniformly distributed 
problem (with biased density), and DTLZ6 is disconnected in both its Pareto set and 
Pareto front. The true Pareto fronts and the best approximations are plotted in Fig. 3. 
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Fig. 1 Final approximation sets with median IGD+ values obtained by the four algorithms on the 
ZDT test suite 
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Table 7 DTLZ suite. Mean and standard deviation of the € indicator 
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DTLZ1 


MOEA/D-DE 
2.54e — 015.2¢-01* 


MOEA/D-STM 
1.60e — 011.6c—01* 


CDG 
1.03e + 016.8¢+00* 


SMS-EMOA 
5.72e — 021.9e—03 


DTLZ2 


1.53e — 011 .0¢—02* 


1.59e — 011.0c—02* 


1.40e — 015.0¢—02* 


5.06e — 02). 1e—03 


DTLZ3 


4.76e + 009,3¢+00 


5.85e + 008 4¢+00 


4.23e + 013.2¢+01* 


9.14e — 017.4e-01 


DTLZ4 


1.67e — 015.7¢_02* 


2.76e — O12.9e-01* 


1.26e — 013.0e—02 


4.52e — 013 6e—01 


DTLZ6 


Table 8 


2.5le — 021. 1e—04 


2.5le — 026 gce—os 


9.18e — 016.7e—01 # 


DTLZ suite. Mean and standard deviation of GD 


8.80e — 025 o¢—o2* 


DTLZ1 


MOEA/D-DE 
3.17e — 021.1¢—01* 


MOEA/D-STM 
9.60e — 033.2¢-02* 


CDG 
2.02e + 011.1¢+01* 


SMS-EMOA 
1.13e — 03}.4e—03 


DTLZ2 


9.14e — 044.5¢—05* 


8.63e — 043 4e—05* 


3.58e — 031.2e-03* 


7.32e — 041 .9¢-05 


DTLZ3 


8.16e — 011 6-400 


9.70e — 01} .4e+00 


4.37e + O12.4e401* 


6.48e — 011 .0e+00 


DTLZ4 


6.52e — 034.4e—04* 


5.49e — 032.2¢-03* 


1.32e — 026.7e-03* 


3.83e — 032.0e-03 


DTLZ6 


6.29e — 043.3606 


6.28e — 041 6c—06 


3.1le — 014.8¢-01* 


Table 9 DTLZ suite. Mean and standard deviation of IGD 


1.02e — 026.1e-03* 


DTLZ1 


MOEA/D-DE 
2.37e — 037.5e—03* 


MOEA/D-STM 
1.06e — 032.2¢-03* 


CDG 
1.19e — 017.8¢-02* 


SMS-EMOA 
4.28e — 0443-06 


DTLZ2 


8.21e — 047. 0¢—06* 


8.21le — 044 oe—05* 


7.10e — 047.3c—05 


8.35e — 041 .4e—05* 


DTLZ3 


1.02e — 019.1¢_01 


1.18e — 014.7¢-01 


7.82e — 015.701 * 


1.40e — 021 .2¢~02 


DTLZ4 


1.3le — 032.4¢—04 


2.37Te — 032.2¢—03 


1.43e — 033.2¢-04 


4.86e — 033.4e—03 


DTLZ6 


8.75e — 051.9e—07 


8.75e — 051.1e—07 


6.25e — 035. 1e—03* 


Table 10 DTLZ suite. Mean and standard deviation of IGD+ 


5.08e — 049 5-_04* 


MOEA/D-DE 


MOEA/D-STM 


CDG 


SMS-EMOA 


DTLZ1 


2.16e — 017.5e—01* 


8.74e — 022.26-01* 


1.19e + 017.9e+00* 


2.88e — 02 5e_04 


DTLZ2 


3.73e — 025.0¢—04* 


3.69e — 023 9e—04* 


4.38e — 025 9¢-03* 


1.95e — 021.2604 


DTLZ3 


6.41le + 00).3¢+01 


7.46e + 001 1e+01 


4.94e + 013.6e+01* 


7.86 — 017.9¢-01 


DTLZ4 


1.68e — 025. 8¢—03 


4.04e — 025 4e—02 


5.97e — 022. 5e—02 * 


7.30e — 026 1¢—02 


DTLZ6 


4.93e — 031.3e—05 


4.93e — 036.5e—06 


1.03e + 008 5e—01* 


6.03e — 024 4e—02* 
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Table 11 DTLZ suite. Mean and standard deviation of HV 
MOEA/D-DE MOEA/D-STM CDG SMS-EMOA 
DTLZ1 | 6.69e — 011.9e-01* | 7.03e — O11 4e_o1* | 6.12e — 032.4¢e_02* | 7.87e — 014.2¢~-03 


DTLZ2 | 3.64e — O12.je_03* | 3.65e — 011.9¢-03* | 3.79e — 015.2¢-93* | 4.27e — 011.5e—04 
DTLZ3 | 1.59e — 011.6c—01 1.13e— 011 6c—01 0.00e + 000.0e+00* 8.27e — 021.3e—01 
DTLZ4 | 3.4le — 013.5e_02 2.99e — 011.201 3.56e — 011.2¢—02 2.65e — 011.5e-01 


DTLZ6 


Table 12 


9.31e — 021.2¢-05 


9.31e — 025 1-06 


2.39e — 023 9¢—02* 


DTLZ suite. Mean and standard deviation of SPREAD 


6.14e — 021 .6c—02* 


DTLZ1 


MOEA/D-DE 
8.41e — Ola 6ce_—o2* 


MOEA/D-STM 
8.31e — 015.2¢-02* 


CDG 
6.99e — 012.3¢—01 * 


SMS-EMOA 
5.27e — 012.7¢-02 


DTLZ2 


8.19e — O12 8c—o2* 


8.27e — 012.6ce—02* 


5.47e — 013.8¢—-02* 


4.50e — 013. 1¢—02 


DTLZ3 


9.37e — 011. 1e—01* 


1.03e + 00; .9¢—01* 


5.9le — Ol}.9e-01 


1.02e + 003.7-_91* 


DTLZ4 


9.79e — 011 .9e-01* 


9.24e — 011.6ce—01* 


5.44e — 014.7e—02 


5.62e — 012.0e—01 


DTLZ6 


7.21e — 012.3¢-03* 


7.20e — 011.2¢-03* 


5.12e — 011.3e-01 


8.19e — 011. 6c—01* 


It should also be noted that, overall, CDG-MOEA provided the best values of the 
spread indicator for the DTLZ suite (best for DTLZ3, 4, and 6 and second-best for 
DTLZ1-2). 


4 Conclusion and Perspectives 


The indicator-based SMS-EMOA showed a remarkably good overall performance 
on the two benchmarks suites we used in our study. For the GD and HV indicators, 
SMS-EMOA also yielded the smallest standard deviation values on all the problems 
in the ZDT suite, as well as on DTLZ1-3. This indicates that SMS-EMOA is a very 
stable algorithm for the parameter settings we used. 

Nevertheless, decomposition-based approaches have focused more and more on 
addressing loss of diversity, which is clearly visible if we look at the performance 
of CDG-MOEA on the DTLZ problems in terms of spread. Moreover, algorithms in 
this category have also been modified to provide better approximations of problems 
with more complicated Pareto sets and/or fronts, which is reflected by the better 
performance of these algorithms on the DTLZ suite as compared to the simpler ZDT 
problems. 

Finally, we should also note that attempts to improve diversity and deal with more 
complicated Pareto front shapes and/or less uniform Pareto set and front distributions 
(such as in the case of MOEA/D-STM and CDG-MOEA) may result in slower 
convergence and failure to rapidly provide good approximations for simpler MOPs. 
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DILZ1 - PF DTLZ1 - SMS-EMOA 


2 05 O05 {, 


OTL22 - SMS-EMOA 


Fig. 2. Final approximation sets with median IGD+ values obtained by the best performing algo- 
rithm (SMS-EMOA) on DTLZ1-3 


The results of the present experimental study should be interpreted with caution 
due to several limitations. First, we chose two test problem suites which are still 
popular, but were subject to criticism (see, e.g.., Huband et al. [9]) for covering a 
limited set of aspects possibly relevant to algorithm performance, especially on real 
world MOPs. Algorithms such as CDG-MOEA were shown to perform very well 
on test suites such as Gu et al.’s GLT [7], exhibiting features (e.g.., disconnected, 
degenerate, extremely convex front, etc.) that may appear in real world scenarios 
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Fig. 3 Final approximation sets with median IGD+ values obtained by the best performing algo- 
rithms on DTLZ4 and DTLZ6 


and make it more difficult for algorithms to deliver good approximations. Second, 
we only conducted experiments for one configuration of parameter settings, using a 
small population size and a small number of evaluations. 

As the indicator-based and the decomposition-based approach have been shown to 
offer complementary benefits—the first optimizing a given quality measure and the 
second primarily optimizing diversity, future research should perhaps focus more on 
algorithms that combine the two approaches. Several such algorithms have already 
been proposed (see, e.g., Yang et al.’s [15] solution), but more experimental stud- 
ies are needed to determine their sensitivity to parameter settings and test problem 
features, and to identify aspects that warrant improvement. 
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Abstract The aim of this chapter is to study three 2-dimensional geometries, namely 
Riemannian, Kahler and Hessian, in a unitary way by using three (local) Hermitian 
matrices. One of these matrices corresponds to the symmetric matrix of metric while 
the other two Hermitian matrices is provided by the Christoffel symbols. The sec- 
ondary diagonal of these Hermitian matrices are generated by the Hopf invariant and 
its conjugate, where for this notion we adopt the definition of Jensen et al. (Surfaces 
in classical geometries. A treatment by moving frames. Springer, Cham, 2016 [13]). 
In the Riemannian case a special view is towards an expression of the Gaussian 
curvature in terms of these data while in Kahler and Hessian geometry we use the 
corresponding potential function and a new (again local) differential operator of first 


order, similar to 0. 
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1 Introduction 


In the very recent paper [5] we associate to a symmetric (real) matrix of order two: 


ry, 7 
rpa(7N’e 
rio 22 


the Hermitian matrix of the same order: 
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ef & QA he, ee _ = 
one ; a i a 2B=rnyp+r.2=Trl. (1.2) 

From the linear algebra point of view the bijective association T < T° is useful 
since it preserves both the trace and the determinant of the initial matrix: 


Tr! =Trl*, detl =detI° (1.3) 


and then these matrices are equivalent. 

In fact, if I is the matrix of the second fundamental form of a surface then 
the expression 2A was considered by H. Hopf in his 1956 Lectures from Stanford 
University and appears at page 137 in the book [12]. The Lemma 2.2 from page 
140 of Hopf’s book is the characterization of CMC surfaces in terms of holomorphy 
of this complex valued function. For this reason, in [13, p. 56] this expression is 
called the Hopf invariant and is denoted by h. Therefore we define for the general 
symmetric matrix I: 

(i) the Hopf invariant: h(1) := 2A, (ii) the mean curvature: H(T) := B, and 
hence the associated Hermitian matrix becomes: 


«(Hh 
melaa) (1.4) 


The aim of this chapter is to extend this approach to two-dimensional Rieman- 
nian, Kahler and Hessian geometries on a manifold M. We obtain three Hermitian 
matrices of smooth (local) functions associated to a given metric g and its Levi-Civita 
connection V. As an application, in the Riemannian case we produce new expres- 
sions of the Gaussian curvature K(g) in terms of these complex-valued functions. 
The isothermal and warped metrics are the main Riemannian examples and a special 
formula is obtained in normal coordinates around a fixed point p € M. In the isother- 
mal case the holomorphy of Hopf-Levi-Civita invariants h!, h? is characterized by 
the flatness of the metric g. In the warped case the real valued Gaussian curvature is 
obtained from the product of two pure complex functions in Formula (2.26). Also, 
the Hamilton cigar soliton, a projective metric and generalized Poincaré metrics are 
considered. 

The second section concerns with the regular oriented surfaces M in the Euclidean 
3-dimensional geometry. In addition to the symmetric matrices of induced metric and 
corresponding Christoffel symbols we have the symmetric matrices of the second 
fundamental form b and the shape (Weingarten) operator S. Since the Hopf charac- 
terization of CMC surfaces through the holomorphy of the Hopf invariant h(b) is 
proved by using isothermal coordinates we work in a general (non-isothermal) chart 
and obtain only a necessary condition for the holomorphy of both h(g) and h(b). 
The surfaces with holomorphic /4 are obtained and the transport of this holomorphy 
from h to h9 through the (vertical) Wick rotation y —> iy is studied. 

This technique extends naturally to Kahler and Hessian geometries. For our com- 
putations we use their common feature to be generated by a potential p and the two 
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dimensional unit disk is treated in both geometries. We remark the necessity to define 
a (local) differential operator of first operator similar to 0 used in complex analysis 
of a single variable. 

Inspired by the Hessian geometry we return to the general two-dimensional met- 
rics by studying the Hopf invariants of other two remarkable symmetric covariant 
fields: the Hessian of a given smooth function [7] and the Lie derivative of the met- 
ric with respect to a given vector field. In the last case we establish, via the Hopf 
invariant, a relationship between (conformal) Killing vector fields X = (X!, X 2) and 
the holomorphy of the function fy = X! + iX? for the case of isothermal metrics. 
We compute these holomorphic fx for all constant curvature metrics and for the 
Hamilton cigar soliton. 

The last section studies the transformation of the Christoffel-Hopf data in the case 
of two Riemannian metrics (or general symmetric linear connections) in subgeodesics 
correspondence. As important particular cases we mention the geodesic or projective 
correspondence and the conformal correspondence. We finish with the determination 
of a flat and projective Euclidean connection. 


2 The Hopf-Levi-Civita Data for Two-Dimensional 
Riemannian Metrics 


Fix a two-dimensional Riemannian manifold (M7, g) and an atlas on it providing the 
local coordinates (x, y) = (x!, x”) on any local chart U. Then the restricted metric 
on U is expressed as gly = (gij(X, y))i,j=1,2 With gij € C°(U, R). The discussion 
above suggests the following notion: 


Definition 2.1. The Hermitian data of the pair (U, gly) is: 


gf — G = on a igi2 C°(U, C), HY = G11 922 — 


€ C™(U, ®) 
(2.1) 


The Hermitian data is useful to express the Beltrami coefficient 4, of g. Namely, 
after [25, p. 142] the expression of metric g in the complex coordinate z := x + iy 
is: 


Jit — Gor + 2igie 


gu + 922 + 2./det g 
(2.2) 


g = Agldz + MgdZ|", 4, = gui + gx + 2 det g > 0, wy = 


with ||{29|lL~v) < 1. Then: 


hg 
2ayg = HY + Jdetg, py = ae (2.3) 
g 
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Let V be the Levi-Civita connection of g and its Christoffel symbols 


rk (x, y))i,j,k=1,2 On the open U C M. Similar to the object above we introduce: 


Definition 2.2. The Hopf-Levi-Civita data of (U, g|y) is the pair: 


Og (PH (C2 a), 
AP = S(T, —Ty,) - iT, e C™ UO), 
eee k a (2.4) 
HF := 5(Ty, +P) € CMU, R). 


The pair (h!, h?) is the Christoffel-Hopf data of g while the pair (H'!, H) is the 
Christoffel-mean curvature data of g. 

A first application of this data concerns with the geodesics of g which are provided 
by the well-known equation, [8, p. 5660]: 


d*y dy\° dy\* dy 
dx? Pan (2) + (QP }, - ey) (2) +(Tia= 20h) ae rie 25) 


It follows: 
d*y 1 1, ( dy : 2 1 2) ( dy : 
— = (H' —Hh — Kh Sh H 
dx? ( ) dx te ) dx ¥ 


+(H' +h! + 23h’) o — (A? +h’) 
Xx 


where as usual Siz and Sz denotes the real and complex part respectively of a complex 
number z. The presence in a projective geometry of quantities I’;;, symmetric in the 
last two indices is pointed out in [23, p. 347]. 


For practical applications we restrict our formulae to some particular cases of the 
metric. Namely, we consider the diagonal case of metric, i.e. giz = 0, which implies 
that h4 is also real-valued as H: 


1 
h(x, y) = 5 (gui, Y) — g22(%, y)) (2.6) 


and the Christoffel symbols are provided by the following relations where, as usually, 
the subscript denotes the derivative with respect to the corresponding variable: 


ina: fens : 
ri, = (In 911), ; r= Cn gi)y ri, = $2 (2.7) 
2 2 2911 
ings); inden 
rm, =- (i) y_ r, = (In 922); r= (In g22)y (2.8) 


a 


which yield by a straightforward computation: 
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Proposition 2.3. The Hopf-Levi-Civita data on U of a diagonal metric g is given by: 


ha oil 9_; 1_ hy 
h= Oo [He _ i(giy| , = au (2.9) 
= =L [HY +iGn)s], He = ZB. 


In particular, if g,z depends only of x* for k € {1, 2} then the Christoffel-Hopf 
data is also real-valued: 


Hy ee —H} — —(gn)y _ H? 
pe — (gus _ 5 id (go2)y _ HT (2.10) 
291 94g 2 2922 Agno 2 
Remark 2.4 The Formulae (2.9) can be unified in: 
| k+1 7] kt+1p9 
fa [Hi + (—Di Gade], He = ae (2.11) 
29kk . Ikk 


where at the derivative with respect to x*+! from h* the power k + 1 is considered 
modulo 2. We remark that if g;; depends only on x and go2 depends only on y then the 
diagonal metric g can be reduced to the Euclidean form. Indeed, with g,;; = A?(x) 
and gx = B?(y) we change the coordinates according to du = Adx and dv = Bdv 
obtaining g = du? + dv’. 


We discuss now large classes of examples and apply the new tools in their study. 


Example 2.5 Suppose that the given atlas is an isothermal or conformal one 
Le. 911 = 9x22 = E(x, y) = H9. Then h9 = H! = H? =0 which means that the 
Christoffel-mean curvature data vanishes. Maybe this fact offers an explanation for 
the suitability of isothermal coordinates in the geometry of minimal surfaces. The 
Hopf parts are: 


= a (E,—iE,y), h° = (Ey +iE,) =—ih'. (2.12) 


h! = 
2E 


The complex coordinate z = x + iy together with its conjugate z = x — iy yields 
the Wirtinger derivatives: 


1 2 
d= a= 5 (dy —idy), =a: 


(a, + idy) . (2.13) 


It follows for E considered as a function of (z, Z): 


-iE, 


Ai(izgD= Ey (nE£)., W(z,D= (2.14) 
’ — E —_— res ’ —— . 


and hence an isothermal anti-holomorphic metric, i.e. satisfying E, = 0, has a van- 
ishing Hopf-Levi-Civita data. 
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Recall that the Gaussian (sectional) curvature K of g expressed in isothermal 
coordinates reads: 


AlnE 2a0(In E 
to (2.15) 
2E E 


with A being the usual Laplacian in real coordinates, A = 07/0x? + 07/dy* = 
400 = 409. In conclusion: 


20h! 


K(z,z)=- E 


(2.16) 


and hence flat metrics are characterized by holomorphic h! = h!(z), and conse- 
quently holomorphic h?. In conclusion, a flat isothermal metric has a holomorphic 
Christoffel-Hopf data. 

Returning to the general real case we note the remarkable particular case of gen- 
eralized Poincaré metric E = E(y) for which: 


hi= 


re a (InE),, h? = (nB; Ka (2.17) 
— = —(In 6 — — = —(n ss =—. ‘ 
2E ” 2E 2 : 


For the classical example of Poincaré metric E(y) = y~? we get: h! = f, h? = 1. 
Another remarkable example of diagonal isothermal metric is the Hamilton cigar 
metric which is a aleaey gradient Ricci soliton supported on M = R? by the metric 


Jcigar? F1 = 922 = recace Its Christoffel symbols are computed in [1, p. 24] and 


then it results h! = = ge’ 


Example 2.6 We remark that the equality h* = —ih! of (2.12) holds if we ask the 
vanishing of right-hand side of (2.5). This means that all geodesics of (M7, g) are 
lines x = constant and y = ax +b and hence (M7, g) is a projective Euclidean 
space. Indeed, from: 


ri, =17?,=0, r,=arl, ri, =2r, (2.18) 
it results: 
hi =1?,-iT],, hk? =I}, —iT?, = —ih’. (2.19) 


As an example of projective Euclidean space we have the unit sphere S? = S?(1) 
with its round metric provided by the central projection: 
2 
rdx* Ty : (2.20) 


proj = a a7 a mpdxdy + 


A9rrei = 


(1+x« 


aFle los 


with the only non-zero Christoffel symbols [16, p. 471] (the case [3.4]): 


The Hopf-Levi-Civita Data of Two-Dimensional Metrics 93 


—2x 
isp eee ge” 


—2y 


2 
Py, = 20, = eee 


rr, =2r.= (2.21) 


and hence h!(z, Z) = ean exactly as the cigar soliton. This means that in spite of 
different Hermitian data the metrics gp; and gcigar have the same Christoffel-Hopf 
data (h!, h?). The Eq. (2.5) for the cigar soliton is decomposable: 


d?y 1 dy dy\? 
= as 1 327 
dx? 1+x2+y2 (<< ») (<3) . “ee 


and it follows that the lines through origin, of equations y = ax with areal parameter 
a, are geodesics of g-igar. The study of geodesics for gi ga, in polar coordinates (r, @) 
appears in [1, p. 25] and we note that in polar coordinates gp,o; is: 


_ dr? a: aa K 
I proj —_ al + r2)2 1+r2’ proj 


= (2.23) 


A relationship between gp,o; and the potential 9" of Hamilton cigar soliton 
will be discussed in Example 6.4. We also point out that g,,.; can be written as: 


1 


Cea? pELae? + dy") + (dx — dy)" (2.24) 


Iproj = 


and we remark the occurrence of the numerator of classical 1-form: 


which is closed but not exact on the punctured plane R? \ {O(0, 0)}. But the restric- 
tion of 7 to, say, the right half-plane x > 0 is an exact form. 
The classical differential system of geodesics for gp,o; integrates to: 
f=C\(lt+x*+y’), HOU +x’ +y"), C1, €R 


and its general solution is: 


x(t) = tan(Ct)cosg, y(t) =tan(Ct)sing, C,geER. 


Example 2.7 Suppose that the given metric is a warped one: 
gi=1, go=f?e), feCc?U CR,R). (2.25) 


Then its Hopf-Levi-Civita data is: 
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i =f/,S—20; re a , H=0. (2.26) 


Its Gaussian curvature is: 


2 1 
7 : = (hy — ih" (nh'), = —h[h? + inh!),] (2.27) 


and then the real-valued function K is a product of two pure complex-valued func- 
tions; as product of real-valued functions we have: K = Sh? [ Sh? + Un h!),]. Some 
interesting particular cases are: 


Ka. = ( 


(2.6.1) the Euclidean plane geometry is provided in polar coordinates by f(x) = x 


and hence: h' = $,h? = =, 
(2.6.2) the canonical metric of the sphere S ? is provided by f (x) = sin x and hence: 
hi= sine h? = —icotx, 


(2.6.3) we generalize the above two metrics following the approach of rotationally 
symmetric metrics from [21, pp. 17-18] based on the functions sn, and 
csz = sn, for a general parameter k € R; also cs; = —ksn,. The warped 
metric is given by f(x) = snx(x) and then: 


hi = + Me aGOS: h? = maa (2.28) 
2 SN (x) 


Example 2.8 Another example of diagonal metric is provided by metrics in 
geodesics polar coordinates when gi; = | and g22 = G(x, y). With the computa- 
tions of [8, p. 5661] we have: 


Gy x G y 
Mh=Ch=li=0 Th=->. Th=55, Thazg. 229) 
Then: 
Gy —(Gy + 21G, Gy 
fia eR". pa Gr FBG) wma. (2.30) 
4 4G 4G 
The Gaussian curvature is expressed in terms of h! as: 
K= 2niy” as (2.31) 
=| G" : 


Example 2.9 The most simple Riemannian metric is the Euclidean one. Suppose 
that M? is parallelizable with the basis E = {E), E2} and define the metric g making 
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orthonormal this frame. If the structure coefficients C are defined by [E;, Ej] = 
€ F E,, then the Levi-Civita connection is: 


1 
Typ = 5ICi — Cie + Cuil. (2.32) 


But only [£, E>] can be different to zero; then '}, = 3, = 7, = 0 and: 


Cc. Cl 
Ty = Ci, Ti, =-T,=Ch, f= aa -icl,W#=-# (2.33) 
and hence its Gaussian curvature is: 
K= Riv = (Cin) — (Ciy)y = (Ci, (2.34) 
The Eq. (2.5) of its geodesics is: 
d’y dy\* dy\* 
aa Ch (2) + 2C}, (2) —C}p. (2.35) 


For the example of solvable non-Abelian case i.e. C a =1,C 5 = 0 we have the 


pseudospherical metric with K = —1 and the Riccati equation: 
d°y dy\* 
— =2(— ] +1. 2.36 
dx? (2 ) Goa 


With WolframAlpha it results the solution: 


1 
YQ) = 02-5 In[cos(V/2x + c1)] (2.37) 
where c, and cz are real constants. The classical differential system of geodesics is: 


¥+2%y=0, J—-x=0. (2.38) 


with the quadratic first integral: x? + 2)? =constant. 


Returning to the general case let (x!, x2) > (x!, ¥2) bea change of local coor- 
dinates on M. Recall the change of Christoffel symbols: 
mk ox = a>x4 a Ox" 

Vaxk — axiaks ©  axiaxi 


(2.39) 
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It follows: 


(2.40) 


iL ax? 7 dy re ax ax? = ax ax? 
sep = Acan* = i = = 
axk ax! ax! — ax? ax? 


where Acan is the eee (of Euclidean type) with respect to tilde coordinates: 
Acan = an? + amt >. Hence the possible tensorial character of data (H', H’) 
remains an open probit: 

Since we arrive at the necessity to discuss special types of coordinates we recall 
after the Exercise 6 of pp. 56-57 from the second edition of [21] that fixed a point 
Pp € M and its normal coordinates then the metric g is expressed as: 


gi=1—K(p)y*, g2=K(p)xy, go =1—K(p)x’. (2.41) 


It follows that in normal coordinates around p we have a geometrical interpretation 
of the pair (h9, H’) in terms of Gaussian curvature: 


K(p) 


De os fl 2 
Oar + y)=1 5 |z| 
(2.42) 


ae Si ait ) 


and the Beltrami data is: 


2 
rT eee ee > 1 = K 7 - (2.43 
=" (p)|z|7) Mg (p) | = cour | (2.43) 


3 The Second-Hopf and Hopf-Shape Data for a Regular 
Surface 


Suppose now that (M7, g, i) is an oriented regular surface in Euclidean 3-geometry 
ie.i:(M,g) > ‘3 — (R3, Ycan) is an isometric immersion. Let b = (bj;);,j=1,2 be 
the second fundamental form and S$ = (s;;)j,;=1,2 the corresponding shape operator. 
In addition to the Hermitian and Hopf-Levi-Civita data we have: 


Definition 3.1 The second-Hopf data of (M, g)|v is: 


bi, —b bi +b 
bf = (w= WS = ibi € C™(U,C), H? = 2 


EC™(U, ®)) : 
(3.1) 
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The Hopf-shape data of (M, g)|v is: 


Sit + $22 


soe G = ae — isi € C°(U, C), H* = 


EC™~(U, R)) . 
(3.2) 


Then the mean curvature H(g) of (M ] g) is exactly H°. We note that both h’ 
and h? are real valued if and only if the parametric lines are exactly the curvatures 
lines of M. Suppose now that g is expressed in isothermal coordinates; for example, 
the Enneper surface and the catenoid are minimal surfaces and usually expressed 
in isothermal coordinates. The Hopf characterization of CMC surfaces through the 
holomorphy of the function h? is also Proposition 1.4 from [15, p. 21]. Moreover, 
on a non-minimal CMC surface there exists a special type of isothermal coordinates 
provided by Theorem 1.5 of the same book (p. 22); in this system of coordinates we 
have: 


e” 


a’ bi, =e’ cosha, bi>=0, by» =e’ sinha (3.3) 


Gul = 922 = 
with w a solution of the sinh-Gordon equation. It follows immediately the Hopf parts: 


1 ee 


w=. W=He, He= _. 3.4 
5 es 5 (3.4) 


For the general case of non-isothermal coordinates we can derive a necessary 
condition for the holomorphy of h? as well as the (Euclidean) gradient vector field 
of H?. 


Proposition 3.2 Suppose that the local chart (x, y) on M? is not isothermal and 
the function h? is holomorphic. Then: 


(Pipbia — Pyber)y = Chpbir — Phy be2)x, (3.5) 
and the Euclidean gradient V can HH? := (HP, H?) of H” is: 
VAP = (Pt be — Te de, PEbe — Py). (3.6) 


Proof The Cauchy-Riemann equations for h? are: 


1 1 
7 Ou — by). = —(12)y, 7 Ou — by)y = (b12)x (3.7) 
and the general Codazzi equations are: 


(b12)x — (bi)y = Thy bee — Thoda, (b22)x — (b12)y = Thode —Thba. (3.8) 
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It results the partial derivatives of H?: 


H? =Vipbe — Tobin, HY = Vipber — Thidin (3.9) 


which means (3.6). The commuting condition a = ick is exactly (3.5). 


Example 3.3 Suppose that b}; = b22 = 0 and bj. 4 0. The Codazzi formulae (3.8) 
reduces to: 


(bias = (FI, —Th)bv, (bi2)y = (P) —Ty)br (3.10) 
while (3.5)—(3.6) means: 
(Pip -—Ts)bvwly =(0 i, —P iy )Bwlx, VeanH” = —(bi2)y, (bi)x)- BD 


But H’ = 0 and then bj is a constant and hence the holomorphic h” is a constant: 
h? = —iby. 


In the same manner we treat the holomorphy of the function 2 even for the general 
case of previous section: 


Proposition 3.4 Suppose that (M*,g) is a non-diagonal Riemannian two- 
dimensional manifold and its h9 is holomorphic. Then: 


(HE +1%,H? + 0H) ga + (HS +H +The =0. (3.12) 
Also, the Euclidean gradient of the function Kth9 is: 
Vean (Sth?) = (—(" gin + Ph gin), Phi gia + Pi gu). (3.13) 


Proof The Cauchy-Riemann equations for h are: 
1 1 
59 — 922)x = —(i2)y; 59 — go2)y = (912) x (3.14) 


and the metrizability of Levi-Civita connection means: 
(g12)y =Thoget+Thoge, (gi2dx = Peon + Poon. (3.15) 
Hence: 
1 k k 1 k k 
zu — go2)x = —U fo gn2 + P9911), zu — gor)y = 92+ Tiger (3.16) 
which gives (3.13). Again the commutativity of second order derivatives means: 


(yg + Pisges = —( hg + Phpgei)y (3.17) 
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and then, a straightforward computation yields the equation: 


[Wt + 0S)2 + arnt, + 1%) — Rin + TECH + ry,)] Garr 
+ [C4 + P%)y + 20, (PY, + PH) + Ri + rei + Ts) ] Jai = 9, 
(3.18) 


where R is the curvature tensor field of g. Due to the equalities gq2 Rf. = gai R}>, = 9 
and using the notations of first section we get (3.12). 


Example 3.5 If M? is the graph R? 5 z = f(x, y) then: 


2 £2 
hd = ae 2i55 (3.19) 


and hence, its holomorphy is expressed by the Cauchy-Riemann equations: 


(Aah == G6 he G- fy = 2h) (3.20) 


Equivalently: 
Fic(fex + fry) = 9, AyCfex + Ary) = 0 (3.21) 
and from the regularity of M? it results that f is (Euclidean) harmonic function: 
tex + fyy =0 (3.22) 


and then we have an infinite number of graphs with holomorphic h’; for example, 
these are provides by the real and imaginary part of the natural powers z”, [11, 
p. 88]. Also, we recall that given two graphs G, : z= u(x, y), Gy: z= v(x, y) 
if f =u+iv isa holomorphic function then G, and G, share the same Gaussian 


en 42 
curvature K ¢ = — [a | < 0, [14, p. 128]. We present only two graphs for which 
~2 


h9(z) is proportional to z* and z 


Example 3.6 The hyperbolic paraboloid (hP),. : R? 3 z = c(x* — y’) withe £0 
has the holomorphic h9: h9(z = x + iy) = 2c*z*. The gradient of its real part is: 
V(RhIV(x, y) = 4c7 (x, -y= 4c*z. The condition (3.12) is satisfied since: H! = 
H? = 0. Also: 

1 4c?x 2 —4c?y i ee (3.23) 


h SS a h = , 
14 4c?|z|? 14 4c?|z|? V1 + 4c?|z|? 


2 
$= i Peet 2-2). (3.24) 
[1 + 4c?|z|?]2 
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If the hyperbolic paraboloid is expressed as (hP)° : R* 5 z = cxy then h9(z) = 
— $2? and V(Sth)(x, y) = c2(y, —x) = —c?(iz). 


Example 3.7 The (funnel) revolution surface M* : R? 5 z = In(x? + y?) has the 
holomorphic h!(z = x +iy) = 2277, Again the condition (3.12) is satisfied with 
H! = H* = 0. The gradient of its real part is: 


42(2z — z* — 2’) 


V(2AY)(x, y) = Ze (x(—x? + 3y’), y(x? — 3y’)) = A (3.25) 
m G 
Also: 
—4x —Ay —2 
Ve, PS, WE 
z7(\z/? + 4) z7(\z|? + 4) Zz|/|z/2 +4 
=o eae 
es (Iz|" + 2)z (3.27) 


[282 +4)2- 


Example 3.8 Since the Beltrami coefficient involves the conjugate HY we ask for 
graphs with anti-holomorphic 4%. With the same approach as above we obtain the 
conditions: 


Jax = Fry: try =0 (3.28) 


and it results the (elliptic) paraboloid of revolution (eP), : R? 5 z=c(x* + y’) with 
AX(z=x+iy)= 2c7z?. 


Example 3.9 We remark that (eP), can be obtained from (hP), through a Wick 
rotation y + iy, [10]. The same transformation applied to the surface of Example 3.7 
gives the surface M? : R? 5 z = In(x? — y) for |x| > |y|. Its Hermitian data is: 


82 A(x? + y?) 16\zI° 
h9 = = ee HY = 2 = ‘ 
(z=x+iy) (4D (x, y) + (x2 — y2)2 (z2 + 22)2 
(3.29) 


Hence the Wick rotation does not transport the holomorphic property from h 
to hy, 


Remark 3.10 Fix V an arbitrary linear connection on M > with the Christoffel sym- 
bols Ti,. Suppose that the (0, 3)-tensor field Vg is totally symmetric, [19, p. 155]: 


Vg(dx, dy, x) = Vg(dy, Ox, Ox), Vg(Axr, dy, y) = Vg(Ay, dx, dy). 
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These two equations mean the Cauchy-Kowaleski differential system (31) of the 


cited paper which inserted in our Cauchy-Riemann equations (3.14) yields the fol- 
lowing conditions for the holomorphy of h’: 


Ag = (Fs = i) 92+ Ting —Tyou, AY = (i = Ri) g2 +P 3,911 — Phy 92. 


Another setting in which the Hopf invariant is useful consists in hyperbolic 
surfaces, which have a negative Gaussian curvature. Such a surface M7 can be 
parametrized by its (real) asymptotic coordinates (a, 8) and from [22, p. 89] we 
have: 


i= —Tlin(— Kp, rae —Tlin(—K)]. (3.30) 
and hence, a hyperbolic surface is a pseudospherical one, i.e. K is anegative constant, 
if and only if both h! and h? are real numbers. The pseudospherical metric of solvable 
non-Abelian case from Example 2.8 is not expressed in asymptotic coordinates since 
ist = Chi, 

We finish this section with a matrix approach, following the notations from Intro- 
duction and inspired by the relation S = b - g~!. We point out that the commuting 
relation g-!-b = b-g! means: 


gi2(bir — bo2) = bi2(gir — 922) (3.31) 

or, equivalently: 
YhI Rh? = Sh? . RhI (3.32) 
which says that the complex functions h’, h? have the same argument. As example, 


the relation (3.31) holds for an isothermal first fundamental form g while for a graph 
surface it reads: 


hi Ga = fy) = fa ~ fh) (3.33) 


Proposition 3.11 For a surface (M7, g) C R° satisfying the commuting relation 
g }-b=b-g"! we have the matrix equality: 


Gig ys Far sie (3.34) 


and then the mean curvature and Gaussian curvature of g are: 
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H9 H?—hoh? HI H?—hghe 
H(g) = = 


det g det g 2 
= (3.35) 
_ (H9H? oh")? —| Hh? — HP hs? 
K(g) — (det g)? S 
Proof From: 
c HY hg c c HP? he 
g ace ) detg° =detg>0, D ne (3.36) 
it results: 
1 (HSH? —hsh> 9h» — Hens 
PG) = ae 3.37 
Oy eee Ge — H°nd HIH® — hah? vey 
The equality of entries from the principal diagonal in (3.37): 
hgh’ = hh’ ER (3.38) 


is exactly (3.32). A straightforward computation of S* from: 


— 1 (gab — grb g2abi2 — girb2 
det g \911P12 — 912911 911522 — Gi2b12 


_ L (g2bu — gibi gubi2 — gibi 
det g \922P12 — 912b22 gi1b22 — gi2bi2 


gives the conclusion (3.34). 


Example 3.12 The relation (3.33) is satisfied for quadratic graphs: z = c(ax + 
by)’. 


We return to the formula of Example 3.5 in order to compute the Gaussian cur- 
vature of a Blaschke factor B, defined by a € C with module |a| < 1: 


Z—a 


By(z) = —_—. (3.39) 
1 —az 
We obtain: 
= 2 
2la\(1 — lal?)|1 — az 
K = 3.40 
ae Fecouers, alias 


and in particular: 


ae: [usta tet] e --| 2Ia| ij ue 
nO= =F a sae)? S@O==|req_ gael 2 CAD 
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Hence, we can define the smooth Gauss-Blaschke functions GB!? :[0,1) > 
[0, +00): 


2t(1 — t?) 2cos ¢ sin? g 
GB'(t) = 7972.1 74” GB' (cos 9) = Bie Don Was! 
2—2t¢+t 2 sin“ g + cos* @ 
2t 
GE (7) = ——_—_., (3.42) 
2—2r? + t4 


4 The Hopf-Levi-Civita Data for Two Dimensional Kahler 
Geometry 


Let now M? be acomplex manifold of complex dimension 2 with local coordinates 
(z!, 27). Fix (J, g= (9:7) a Kahler structure on M and let V = cay) be the Levi- 
Civita connection of g. We extend V in a C-linear way and then the only possible 
non-zero Christoffel symbols are ry and re. = Te. It is well-known that the Kahler 
condition gives: 

rk = gf! vt = gf! “au (4.1) 


with (gil ) the matrix inverse to (g; 7). Recall that the fundamental 2-form of the 
Kahler structure is defined as: 


@(X, Y) := g(JX, Y) (4.2) 


on two arbitrary vector fields X, Y. Using the same Definition 2.2 we derive imme- 
diately: 


Proposition 4.1 The Hopf-Levi-Civita data of a complex two-dimensional Kéhler 
metric g is: 


[1 (Agr Agr) — Agi o (agit, Aga 
nk = off Wl ai \ pea 9 (ot , Gr) 4. 
2 E (4 Oz? az2 | 2 \ dz! - de ais 


Suppose that p is a (local) Kahler potential for g which means that w = Sddp. 
Then: 


a, gee ao a0  &p gia (a .a\? 
— — Ll = — Ll 7 
g az? L2 \acz!)* ~~ a(z?)? az! dz2 2 a7 \dz! az) © 


(4.4) 
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H = (4.5) 


gt a ap n ap 
2 agi Lacz!)? — a(z?)? J" 


Example 4.2 The Bergmann metric on the two-dimensional unit disk M={z= 
(z!, 27) € C*; f(z) = 1-217 — |z*/? > 0} is, [17, p. 14]: 


bij zig 
j 
+ 


ee. Ge si —2zi), p=—Inf. 4.6 
7a Pa © FZ zz!), p nf (4.6) 


Gij = 
A long but straightforward computation gives: 


f (ght = (6 — id! iz), fH =e. (4.7) 


Example 4.3 The Fubini-Study metric on the complex projective space P?(C) is, 
[17, p. 16]: 


om, a oo 
97=—S2—, FHltlP4IPR, = f@OCl+2z/), p=Inf. 
(4.8) 


Poh =2¢@)-i2", F@OH =e 4+2¢e12 + 7). 4.9) 


Returning to the general Formula (4.4) we note the birth on M? of a remarkable 
differential operator: 


1/oa ) 
d:= ~| — -i- > 4.10 
Ata isa) coe 


yielding a (formal) Laplacian-type operator and a Carlson-Griffiths operator: 


a? a? i 
d° = —(d—9). (4.11) 


A = 400 = 400 = ; 
a(z!)? 7 a(z!)? Qn 


It follows another expression of (4.5) and (4.6): 


2. 
; -a(A 
eel, pi ag OP. (4.12) 
g hs 
re 
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5 The Hopf-Levi-Civita Data for Two-Dimensional Hessian 
Geometry 


The real counter-part of Kahler geometry is Hessian geometry and following the 
previous section we treat here this case. Suppose that our real manifold M7 is endowed 
with a flat (and symmetric) connection D. We adopt, see also the Eq. (14) of [6, p. 
3338]: 


Definition 5.1 ((24, p. 14]) The Riemannian metric g is called D-Hessian if there 
exists (a potential) ¢ € C°(M) such that g is exactly the D-Hessian of p: g = D(dp). 

Let (x, y) = (x!, x”) be an affine coordinates system with respect to D. The above 
definition means: 


ap 


~ axiaxd 


Gij (5.1) 


which yields the (local) Codazzi equation: 


0Gia OGja 
= = 5.2 
ox/ ox! 2 


The Christoffel symbols of g are, [24, p. 15]: 


1 Gi; 
rs es Sar (5.3) 


and then we derive: 


Proposition 5.2. The Hopf-Levi-Civita data of the Hessian metric g is: 


pat 1fdgu Aga j 29 Heat ee al) 
~ 9 |2\ ox ay dy |’ ~ 4 ax " ay J 


With respect to the potential p it follows: 


kL 2 kl 
k_ 9 a C) 0 k yo 
= 4 ax! (= i) Beda 4 ax! (Pxx + Pyy). (5.5) 


Remark 5.3 A more detailed version of Formulae (5.4) is: 


A(det g)h' = gx [(gi)x — (912) y — 21g) y] — 912 [Gi2)x — (G22) y — 21912) y], 
(5.6) 


A(det g)h? = —gi2 [(gi1)x — (gi2)y — 2i(gi1)5 | 
+ git [(gi2)x — (922)y — 2i(gi2)y] . (5.7) 
A(det g)H' = gy [gis + (912)y] — 912 [Gi2)x + G2)y]- (5.8) 
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A(det g)H* = —g12 [(git)x + (912) y] + gus [(gi2)x + (G22) y] - (5.9) 


Again we note the birth on M of the differential operators of first order: 


1/oa 0 = 1/9 0 
d:=-(—-i—], d:==(—+i— 5.10 
s(= ix) s(a+i3) oe) 
and hence: 
a(a2 a(aa, 
hk = gk! ( p) H* = gf ( p) (5.11) 
ax! ax! 


Example 5.4 Let us consider the Hessian unit disk A! = {(x, y) € R*; f(x, y) = 
1 — x? — y? > 0} with potential o = —In f. Then: 


ee ae 2(1 + x? — y?) 4xy 2(1—x* + y?) 
Ii =F bY + ax x a a. a aL, an 
(5.12) 
and then the Hermitian data of the metric g is: 
202-y2) 4 —iy)? z\? 2 
ja es 2 ao" 2) =2(=) , H=5. (5.13) 
ri f i f f 
In the same unit disk we can consider the metric g, of [20, p. 246]: 
= 1—y? vs 2xy 1—x? 2 
Sn = (=P ayy? dx" + Garp dady + Gar yy (5.14) 


Note that g3 is exactly the round metric of the sphere S? expressed as a graph 
and if we apply the double Wick rotation (x — ix, y > iy) to g4 we get the metric 
Yproj- Equivalently, g3 is the round metric of S > provided by orthogonal projection. 
Also, the Funk metric on M is of Randers type with the Riemannian metric g4 and 
the 1-form £ = dp, as is expressed in Example 3.8 of [3, p. 88]. The only non-zero 
Christoffel symbols of g4 are: 


2x 
1— x2 — y?’ 


2y 


2, 1 
le oe ee ee 


1 2 
Py, = 20, = 


and then the dual of f is the vector field € = —2(1 — x? — y*)? («2 + vit). In 
polar coordinates we have: 
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dr? r2de2 


, Ky =-l. 
i-ry  i=p “4 


94 = 
In conclusion, gp-o; and g4 can be unified in the metric: 


_ dr? be r?de* 
~ (+er2)? © 1+ er?’ 


Ge Ky, =¢€, € €{+1,—-l} < {proj, 4}. 


The cited paper [20] studies the geodesics of g, trough the support function u = 
u(t) via the equations: x = u sint + u’ cost, y = —ucost + u' sint. Performing the 
same approach to gp; We arrive at the differential equation: 


1 u 
= 5.15 
utu” 1+u2+(') ( ) 


which reduces to uu” — (u’)? = 1 with solution u(t) = cosht. 


We return now to the general case of a Riemannian metric g on M?. The Hessian of 
a smooth scalar field f € C°(M) with respect to the metric g is another symmetric 
covariant tensor field: 


vf x Of 
ji. s.— =i uh 
0 Dis Ox! dx/ Y axk i218) 
and hence its Hopf invariant and mean curvature are: 
nH) = BS" ify ati, Hp) = UE — ap, 
(5.17) 


Let us remark that the condition (3.22) for f EC oR, can) Means exactly 
A (Aean(f)) = 5 Acan f = 0 while the conditions (3.28) means exactly h(Hean(f)) 
= 0. If (M’, g) is a projective Euclidean space then the Hessian of f has the com- 
ponents: 


af 2 Of af 1 Of 12 Of 
axe ag? Molfe = gy5y Tag FT 


A,(f)u = cis 1295 2 By" 


vf af 
A, = IT 
g(f)22 ay2 12 ay 


Another symmetric covariant tensor field on (M, g) is the Lie derivative of g with 
respect to a fixed vector field X = xk oe = (X!, X”): 


(Lyg)ij = X*gijn + X* Kj + X* oni. (5.18) 
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We associate to X the complex-valued function: 
fx = X'+ix? (5.19) 


and we establish a relationship between the Hopf invariant h(£yg) and fx: 


Theorem 5.5 Suppose that the metric g is an isothermal one. Then the Hopf invari- 
ant of the Lie derivative is: 


and consequently if X is a conformal Killing vector field then fx is holomorphic. 


Proof We have: 


1 
h(Lxg) = 5lx* (gi1.k — 922.4) + 2X* ger — 2X4 gi] 


— i[X* gion + X42 + X59] = 
= X*AY, + gel X*, — 1X5) - gilXS +i X4). 


Taking into account the isothermal hypothesis it results: 


h(Lxg) = E[(X!, — X%4) -i(X4, + X4)) (5.21) 


which yields the conclusion. 


Example 5.6 We discuss the class of two-dimensional space-forms given by K= 
constant. The isometry group Jsom(q) has the maximal dimension mar) — a =3; 
(5.6.1) K =c? > 0. The round metric of the sphere S” (+) provided by the stereo- 
graphic projection is: 


A(d 2 d 2 Id 2 
ge yee ee) = cad ) (5.22) 
[eCp x? yh c(1 + 22) 
and the basis of its Lie algebra isom(g) of Killing vector fields is: 
yx? r? sind 4 —r?)cosé 6 
Xx, y) = (xy, see te eo +0 6 a4 a e) 6 a 
x?—y? r? cos@ ¢ r2—1)sinO 6 
Oy) = (4a = + o 7 ae 3 @ a, (5.23) 


X3 (x, y) = (—y, x) =r(— sin, cos 6) 
where p means positive. The corresponding holomorphic functions are: 


(1 — z*) 2+1 


be) = aa f2 (2) = iz. (5.24) 


fi @ _ Sg 
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Recall that the Weierstrass data of the Enneper surface is [13, p. 234]: M = 
C, g(z) = z, n = dz and then: 


1 1-2? 
F 2 
a= -—]i(l+2z’*) | dz. 
oN 
Zz 
Hence we have: 
i (z) 
ia = —f2 (2) dz. 
f- . (z) 
(5.6.2) K =—c? <0. The Poincaré expression of the metric concerns with the 


upper half-plane model of hyperbolic geometry C1 = {z € C; 3z > O}: 


: dx?+dy? — (|dz|\’ 
gh(x,y) = = ( ) (5.25) 
(cy) cy 
and the basis of its Lie algebra isom(g) of Killing vector fields is: 
Xi(x,y) = (x? — y?, 2xy) = r?(cos 26, sin 20), (5.26) 
X3(x, y) =, y) =r(cosé,siné), X3(x, y) = (1,0) ; 


where 1 means negative. The corresponding holomorphic functions are: 
f'O=2, A@O=z A@O=1. (5.27) 


For above metrics the isometry group is Jsom(g?’) = O(3) respectively Isom(g”) 
= SL(2,R) ~ SUC, 1). But: 


ee eS, eS Se 
[ala ae, [A Xe) = 2 
and hence {X}, X5, X3} is not a Milnor frame for SL(2, R), [4, p. 52]. 


It is known that the Cayley-Mébius map g : A! > C,: 


1+z 
l-z 


g(z) =i 


is a bijection. Hence we have the“Killing” functions on the unit disk g; := f ; og: 
A! > C, j = 1,2,3 and the remarkable is: 


1 2 
nto =-( **) =~ (144 3m), g2(z) = 9), 3(Z) = I. 


1 
n>1 
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In fact, gy) = —(4k + 1) with k(z) = a7 the Koebe function; hence the image 
of yg; is C \ (0, +00). 

Also, it is known that the map w:C, > Cy, W(z) = a = —z7!, preserves 
the hyperbolic geodesics of (C+, g1). Hence we have the AincHons p= Fioy: 
Cz, > C, j = 1, 2,3 and the remarkable is: 


=2. 


_4 _,2 
Wi@) = Ta Go. 


The geodesic polar coordinates (p, g) € (0, +00) x (—3, z) on the Poincaré 
upper half-plan (C,., g/) are given by p := disty(i, 2), y = arctan 4°! and 


then: . 
sinh p cos @ 1 


cosh p — sinhp sing’ am cosh p — sinhp sing 


The function f;’ becomes: 


1 


inh? * —1)+ (2sinh pls 
l= sai pene? [ (sin COs” ~ )+ Q@sin pcos ¢)i | 


7 (Zz) = 


We can connect the above polar coordinate g to the Hopf invariant of a matrix. 
In [26, pp.153-154] the manifold C, is identified with the homogeneous space 
SP. = SO(2) \ SL(2, R) of positive definite binary quadratic forms of determinant 
1 via the map: z € Cy <> W, € SP2: 


0 i PP ae 
We= (41): A. Paice PA wee? TS =), 
y -x 1 y —X 1 


Hence ¢(z) is exactly the argument of the Hopf-invariant h(W-). If the trace 
Tr(W,) = sat! > 2 then the matrix W, is a hyperbolic one with real eigenvalues 


A, = Let e(z) be the eigenvalue which is larger than 1 and the norm of W, is, by 
definition, N(W,) = e(z)?. We obtain: 


1] |z|\?+1 IZ2+1\? 
NW) = 4 3 ( _ 


On the vertical axis Oy : x = 0 the hyperbolic condition means y ¥ | and then: 


—) 


N(Wiy) = ( 2y 
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Recall also the fundamental region F of the modular group PSL(2, Z): 
1 
={z¢€ Cy; |z| > 1, [Siz] = 5} 


and its right corner @ = 5 + ig = = e's € F whichisanelliptic pout then V(W,,) = 


3. Its polar coordinates are rove by g(@) = Oand cosh p(w) = a Sinh p(@) = 


a Another remarkable element of F is the complex Golden ratio an a 5 +i we 


from [9, p. 1231]. Some of its properties are related to the classical Golden ratio 


_— Livs. 
g= oP: 


N(W9,) = 6°, dp, $c) = 510 A cos ~(~.) = Jz, Sin PP.) = Fe, 
sinh p(¢_) = 5s cosh p(¢c-) = 2 


The angles of the hyperbolic triangle Aw@,i and their sum are: 


a 2 
4£(w) = = = 60°, £(¢,) = 90°, £(i) = arccos —= = 26.57°, 
3 3 


sum = 176.57° < 180° and then its aria is S(Aw@i) = 7 — sum = 3.03 < 60 = 
= = S(F).The hyperbolic distance between w and ¢, is given by: cosh[dg (@, @.)] = 
Tice Since In 3 = 0.51 > 0.50 = V5 — V3 it follows that the hyperbolic distance 
between w and ¢, is great than the corresponding Euclidean distance. The picture of 


the Golden hyperbolic triangle Aw¢,i (Fig. 1) was made by my colleague Marian 
Ioan Munteanu: 


Fig. 1 The Golden hyperbolic triangle 
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(5.6.3) For K = 0 the Euclidean metric gcan = dx? + dy? = \dz|" = dr* + r°dé” 
the basis of its Lie algebra isom(gcan) of Killing vector fields is: 


Xi(x, y) = X34, y) = (, 0) =cosoZ — 82, 
XS$(x, y) = (0, 1) = sing 2 + S42 (5.28) 


r ag 
XS(x, y) = X3 (x,y) = (— “y,x) = 


with corresponding holomorphic functions and Lie brackets: 


| f@=1, £OS) Aes 
(5.29) 


[X?,X§]=0, (X$,X§]1=—XG, (X$, Xf] = —X$. 


The isometry group Jsom(gcan) is the affine Ontezenal group AO” := R" x 


O(n). In polar coordinates the Wirtinger vector field is 2 = se ( 2 2 io). The 
function f; = f¥ is the skew-symmetric linear transformation with the matrix: 
og Joan Ye 
J= (; 0 ) = Vou XS. (5.30) 


which is the trigonometrical rotation of the plane. The skew-symmetric endomor- 
phisms associated to the above Killing vector fields are: 


P P 
g Pp —2x g Pp gh Pp 1—x?-y? 
V9 XT = Tare Viti=qenl va ea 
vaixt = Sty, vetxy = J, VIXS = FT. 
5 


The square of their norm is: 


c2(14+x24+y?)? c2(14x2+y?)? a 


A(x?+y? e 
WX? = PO, XE = XE = 1, 1X2? = x? + y. 


| |X? ||? = (y?=x?+1)?+4 (xy)? ; | X2 || = (x?=y? +1)? +4 (xy)? 
c2(1+x2+y2)? ’ 


With the curvature formula of Example 3.5 we have K ; for all above f. The only 
non-zero are: 


=4 


K pe (z) = Kye (y= (+ lei)?" 


-1 
Gs ppr 784 = 


Fix A € C™(IR?) and let X, be conformal Killing vector field with 4 as factor: 
Ly,g = 2Ag. Then, for a constant 2 € R the set of X, depends on a real smooth 
function f through: 


Xi, y) = Ox+ fO),AY- FO), A@ =Az—ifG)+ FO). 6.31) 
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Remark that the metrics 9.30, Gcigar ANd Jean Of R? are all conformally equivalent: 
Ic = FSG Seigar: These three metrics share the Killing rotation X . = X§ with 
the isothermal metric of the Enneper surface. 

Remark that the punctured plane R? \ {O(0, 0)} with the pseudo-Riemannian 
metric: 


g= _| {dx @ dy + dy @ dx) (5.32) 
x-+y 

has the radial Killing vector field X’(x, y) = (x, y) = X}(x, y) = rzé with the 
holomorphic function f-(z) = z. In fact, X" is parallel with respect to g. 

This correspondence between Killing vector fields and holomorphic functions 
is already known, see for example [25, p. 143], while for the conformal Killing 
vector fields this relationship is discussed in [11, p. 88]. More generally, inspired by 
almost Ricci solitons we consider given the triple (g, X, f) on M and its associated 
symmetric covariant tensor field: 


(g, X, fy" := Lyg + 2Ricg +2fg (5.33) 
whose vanishing characterizes an almost Ricci soliton; here Ric, is the Ricci tensor 


of the metric g. Hence, in dimension two the vector field X is a conformal Killing 
field: 


Lxyg = —-2(K + fg (5.34) 


and we can apply the last part of Theorem 5.5: 


Proposition 5.7 If (g, X, f) is an isothermal almost Ricci soliton on M* then fx is 
a holomorphic function. 


Example 5.8 The cigar soliton is a steady (f = 0) gradient Ricci soliton with 
xXeigar = Vegi foe with: 


fo" (x, y) = —In(l +x? + y?), X%9" (x, y) = —2(x, y) = —2X", 


tue, (5.35) 


X94" is also a homothetic vector field for the Euclidean metric can Since we 
make A = —2 and f = 0 in (5.31). 
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6 Subgeodesic Correspondence of Metrics and Connections 


In [2, p. 173] we introduce a transformation of Riemannian metrics as follows. 


Definition 6.1 The metrics g, g are called in subgeodesic correspondence if there 
exists a vector field € and a 1-form w such that their Levi-Civita connections are 
given by: 


Pi = Vij + ofa, + dha; + gi8". (6.1) 


For our two-dimensional setting M” a straightforward computation gives the trans- 
formation of the Christoffel-Hopf data: 


hk = hk + nek + (5 — 185)(w — iar). (6.2) 


Example 6.2 (i) If& = Othen the given metrics are in geodesic or projective equiv- 
alence and then: 


h* = h* + (8' — i8k)(@ — ies). (6.3) 

(ii) If the metrics are conformal equivalent i.e. g = e*“g then we have the sub- 
geodesic correspondence with € = —V,u and w = du. Then: 

hk = hk — h9(Vyu)* + 2(5k — idk)au (6.4) 


with 0 given by (2.14). 


We search now a projective equivalence such that h! = h? =0. From (6.3) it 
results the initial h!, h?: 


h* = (8 — 183). —igr), Y= —o. (6.5) 
If g is the Euclidean gradient of the scalar field f i.e. (g; = fx, g2 = fy) then: 
hk = 2(6k — idk)af (6.6) 


and let us say that (g, f) is a projective pair. A method to obtain projective Euclidean 
space is given by: 


Proposition 6.3 If (g, f) is a projective pair such that V4, = ?, = 0 then (M?, g) 
is a projective Euclidean space. 


Proof The explicit form of (6.6) is: 


Thy _ Tx =2f, = PARE Ti TS = 2fy = Tip (6.7) 
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and then the hypothesis implies the Eq. (2.18). We note that the vanishing of I'5, and 
T?, means that the lines x = constant and y = constant are the geodesics and that 
the transformation x = a(x), y = B(y) preserves the equations FS — re = 0. 


Example 6.4 (i) With P4,=T7, =0 and f = f“%" from (6.7) we derive the 
Christoffel symbols (2.21) of gpro;- 

(ii) The projective connection associated to the linear connection 
V = (Ti)i<ijrsn is, [16, p. 467]: 


k k Sr 8j ; i 
My = Tj - = 7 eae = hi Tas > r, (6.8) 
i=1 


In dimension 2 this means: 


2 1 
(eee ee ee 8 
My =, My =Pi- 31, Ma =x — 322 
and a straightforward computation gives for the cigar metric: 
1 2 1 as 
My, = —Wy, = 31139 = 30 4x2 4)" 
2, = —m!, = 31 sd (6.10) 
22= ig eh ; 


3 +x? + y?)" 


As generalization of conditions (6.7) we allow the symmetric linear connection I" 
to be a non-metric connection and let a and b two smooth functions such that (2.18) 
holds: 


T3,=2F),=24, Vi,=2r2,=2b, Ti,=1?, =0. (6.11) 
Then the curvature of I" is zero if and only if: 
Rin, = ax — 2by tab =0 
Rio) = —ay +a” =0 


Ry Hb; —b7 =O 
Re. = 2a, — by —ab=0. 


(6.12) 


We have firstly the trivial solution a = b = O and we get the Euclidean connection 
I = 0. The non-trivial solution of system (6.9) is parametrised by two real constants 


a, p: 
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1 = = 
— b= Bf, gy) 6.15) 
ytax+p  ytax+B ytax+B 


The classical equations of geodesics for this linear connection are: 


¢  R@E+H) 5 WOH) 


= — 6.14 
ytax+p > y+tax+B = 


This differential system integrates to: 
t= Ci(ytax+By, p= Caytaxt+ fy, Ci,C,€R\{0}. (6.15) 


For 6 = 0 the final solution is given by lines through origin parametrised as: 


A B 
x)=, y=, te (0, +00). (6.16) 


If w = 1 and 6 = 0 then the linear connection (6.11) is an example of locally 
homogeneous linear connection (3.19) from [18, p. 182]. 


7 Conclusion 


As main benefit of this chapter we point out the richness of informations provided 
by the Hopf-Levi-Civita data in three remarkable 2D differential geometries, namely 
Riemannian, Kahler and Hessian. This approach is open to other interesting topics 
from two-dimensional geometries and we hope to develop it in future papers. 
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Some Applications of Fibonacci ®) 
and Lucas Numbers si 


Cristina Flaut®, Diana Savin®, and Gianina Zaharia 


Abstract In this paper, we provide new applications of Fibonacci and Lucas num- 
bers. In some circumstances, we find algebraic structures on some sets defined with 
these numbers, we generalized Fibonacci and Lucas numbers by using an arbitrary 
binary relation over the real fields instead of addition of the real numbers and we 
give properties of the new obtained sequences. Moreover, by using some relations 
between Fibonacci and Lucas numbers, we provide a method to find new examples 
of split quaternion algebras and we give new properties of these elements. 
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1 Introduction 


There have been numerous papers devoted to the study of the properties and appli- 
cations of Fibonacci and Lucas sequences. See, for example, [2, 3, 5, 7],etc. Due to 
this fact, to obtain new results in this direction is not always an easy problem. Our 
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purpose in this chapter is to provide new applications of these sequences. Therefore, 
we studied if some algebraic structures can be defined by using Fibonacci and Lucas 
elements and we provided some new properties and relations of them. 

Moreover, by taking in consideration some known relations between these num- 
bers, we give a method to find new examples of quaternion split algebras. This method 
can be an easy alternative to using the properties of quadratic forms. 

Let (fn)n>o be the Fibonacci sequence 


tn = fr-1 + fa-2, N= 2, fo=9; fi = 1, 
and (/,),>9 be the Lucas sequence 
Ip = Epi tho, n= 2,lo = 2; =1. 


Let n be an arbitrary positive integer and let p, g be two arbitrary integers. In the 
paper [2], we introduced the sequence (g,),> , where 


Grol = PS + glist, n= 0, (1.1) 


with gp = p + 2q, g) = q. We remark that g, = gn—1 + Gn-2- 

This sequence is called generalized Fibonacci-Lucas numbers. To avoid confu- 
sions, from now on, we will use the notation g?’? instead of gy. 

Let H (@, B) be the generalized real quaternion algebra, the algebra of the elements 
of the form a = a, - 1 + aye. + a3€3 + aye4, where a; € R,i € {1, 2, 3, 4}, and the 
elements of the basis {1, e2, e3, e4} satisfying the following rules, given in the below 
multiplication table: 


1 al e2 3 
1 1 al e2 3 
e| e| a e3 ae2 
e e -—e 8B —Bei 
e3 €3 —aen Bey —ap 


We denote by ¢ (a) and n (a) the trace and the norm of a quaternion a. The norm 
of a generalized quaternion has the following expression 


n (a) = a? — aa? — Ba? + aBai. (1.2) 
If, for x € H(a, B), the relation n(x) = 0 implies x = 0, then the algebra 


H (a, B) is called a division algebra, otherwise the quaternion algebra is called a 
split algebra. For a = 6 = —1, we obtain the real division algebra H. 
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2 Preliminaries 


First of all, we recall some elementary properties of the Fibonacci and Lucas numbers, 
properties which will be used in this chapter. 

Let (fn)n>o be the Fibonacci sequence and let (/;,)n>o0 be the Lucas sequence. Let 
a= 1v5 and 6B = 1/5 | be two real numbers. 

The following formulae are well known: 


Binet’s formula for Fibonacci sequence 


a” — B" q”? — B" 


= : N. 
a—p a 


tin = 


Binet’s formula for Lucas sequence 
lL, =a" + B", neN. 


Proposition 2.1 //] Let (fn)n>o be the Fibonacci sequence and let (l,)n>0 be the 
Lucas sequence. Therefore, the following properties hold: 


(i) 
5 | fn if and only if 5 | n. 
(ii) 
fon =I fry NEN. 
(iii) 
fi + fr = fai, nN. 
(iv) 
fo — fast fa—-1 = (-1)""' =n EN. 
(v) 
fon = (-1)""! Snon EN. 
(vi) 
L,=(-)I"h,neéeN. 
(vii) 
P=5f?+4(-1)",neN. 
(viii) 


ln = 2 4+2(-1)"*! neN. 
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(ix) 
lonlons2 = Sy = 1, neN. 

(x) 
fon ar ci = 2fntn+i.n EN. 

(xi) 
fan — 7. =2fifr1.n€EN. 

(xii) 
Pf? =4 fii fasin EN. 

(xiii) 

fron = fy _ jot EN. 
(xiv) 


lan = Sfp, +2:n EN. 


Proposition 2.2 [4] Let K be a field. Then, the quaternion algebra Hx (a, b) is a 
split algebra if and only if the conic 


C (a,b): ax” + by” = 2 


has a rational point over K*, i.e. there are Xo, Yo, Zo € K* such that axé + bye = 
2 
Zo: 


3 Main Results 


For the beginning, we want to find some properties of Fibonacci numbers fs,,n € N. 
We start with some examples. 


Example 3.1 fio = 55 = 11 fs, fis = 610 = I1 fio + fs = 122 f5, foo = I fis + 
fio = 6765 = 1353 fs. 

Using the computer algebra system Magma [8], by using the function 
Fibonacci(n), we obtain another example. 

We get fos = 17167680177565, f35 = 9227465 and f65 div f35 is 1860497, fes 
mod f35 is 9227460 and 9227460 div fs is 1845492. Therefore, it results that 


fos = 1860497 fxs + 9227460 = 1860497 fs + 1845492 fs. 
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Proposition 3.2 The set 
A= {afs,|n € N,a € Z} 


is a commutative non-unitary ring, with addition and multiplication. 


Proof We remark that fo = 0 € A is the identity element for addition on A. It is 
clear that the addition on A is commutative. 


We consider the Fibonacci number f5,,. If n 4 0, there are two cases. 

Case 1. If n is even, then n = 2k, where k € N — {0}. Applying Proposition 2.1 
(ii), we have that fs, = frog = lsx fsx. If we denote /5, = a € N — {0}, we obtain 
fon = fio = afsx and, applying Proposition 2.1 (i), we obtain fs, = a’ fs, where 
a eN- {0}. It results, 


fon = Siok = afse = aa fs, witha,a € N— {0}. (3.1) 
Case 2. If n is odd, then n = 2k + 1, where k € N — {0}. We have 


Sn = fiok+s = frorsa + frons3 = 2 fiok+3 + fior+2 = 
= 3 fiort2 + 2 froxtr = Sfroktr + 3 flok = froc+i fs + 3 five: 


If we denote fio.4; = ae N— {0}, we obtain f5, =afs+3fio.. Applying 
Proposition 2.1 (i), we get fiox = bfs, where be N — {0}. We get 


fon = fionss = afs + 3 fiox = (a + 3b) fs, witha, b € N — {0}. (3.2) 


From the relations (3.1) and (3.2), it results that a fs, — Bfsm = vfs:, foreachn, m € 
N and for each a, B € Z,where/ € N, y € Z. Therefore (A, +) is a subgroup of the 
group (Z, +). 

Applying Proposition 2.1 (i), we obtain that a fs, B fsm = 5f5,, for each n, m € N 
and for each a, 6 € Z, where r € N, 6 € Z. From here, it is clear that the addition 
on A is commutative. 

Therefore (A, +, -) is acommutative non-unitary subring of the ring (Z, +, -). 

Using the above proposition, we obtain the below results. 


Proposition 3.3. We consider (gn i the generalized Fibonacci-Lucas numbers 


and the set 


n>1 
n 

M= 1) 98°" In; EN, ping € | U {0}. 
i=1 


Then, the following relations are true: 


(i) (M, +) is a subgroup of the group (5Z, +) . 
(ii) M is a bilateral ideal of the ring (Z, +, -). 
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Proof Using Proposition 2.1 (i), we have 5 | g%?°*", for alln € Nand p,q € Z. Let 
n,m € N—{O}and p,q, p.q.a, B € Z. We obtain that 


ot roo 
D.5q P35q _ ap,5aq ap aq 
adsy + BG5m = 95n or Ism 


From here, we obtain conditions (i) and (ii). 


Proposition 3.4 With the above notations, the following statements are true: 


(1) The quaternion algebra Hg (fion+s, —1) is a split algebra over Q,n EN. 

(2) The quaternion algebra Hg (Shon —2) is a split algebra overQ,néEN. 

(3) The quaternion algebra Hg (Sata ny (-1)""') is a split algebra over 
QaneN. 

(4) The quaternion algebra Hg (6, (-1)") is a split algebra over Q,n EN. 

(5) The quaternion algebra Hg (lonlon+2, —5) is a split algebra over Q,n € N. 

(6) The quaternion algebra Hg (2 fn fn+i, — fan) is a split algebra over Q,n EN. 

(7) The quaternion algebra Hg (fan, —2 fn fn—1) is a split algebra over Q,n EN. 

(8) The quaternion algebras Hg (fetta i) and Hg (1, — fn—1fn+1) are split 
algebras over Q,n EN. 

(9) The quaternion algebra Hg (fan, 1) is a split algebra over Q,n EN. 


Proof (1) Applying Proposition 3.4 (i) from [9], for n + 5n + 2, we obtain that 
the quaternion algebra Hg (fion+s, —1) is a split algebra over Q, for alln € N. 
(2) For this purpose, we use Proposition 2.2. Therefore, we study if the equation 


1 2 
aby = ay Se 3.3 
5 20 x 5 y z (3.3) 


has rational solutions. From Proposition 2.1 (xiv), we have Joo, = 5 ies + 2, 
for alln EN. It results 2 ]y0n - z = Find then (xo, yo, Zo) = C1, 1, fion)EQ x 
Q x Q is a rational solution of Eq. (3.3). We obtain that the quaternion algebra 
Ho (4120n, _ 2) is a split algebra over Q, for all n € N. 


(3) Using the same idea, we study if the equation 


wm 


frrifrix? +(-I" ly = 2 (3.4) 


has rational solutions. From Proposition 2.1 (iv), it results that i = froifn-it 
(—1)""!, for alln € N, therefore (1, 1, Jn) is arational solution of Eq. (3.4). We 
obtain that the quaternion algebra Hg ( Frtifn-1 (- i") is a split algebra over 
Q, for alln €e N. 

From Proposition 2.1 vii), we have /? = 5 f? + 4(—1)"and it results that the 
equation 5x? + (-1)""! y? = 2’ has (fn, 2, ln) as arational solution. Therefore, 
the quaternion algebra Hq (5, (- 1") is a split algebra over Q, for alln € N. 
(5) We study if the equation 


(4 


Ym 


lonlangax? — Sy” = 2° (3.5) 
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has rational solutions. From Proposition 2.1 (ix), we have la,lon42 — 5 lie i= 
1, n EN, it results that the pair (1, hire 1) is a rational solution of the Eq. (3.5). 
Therefore, the quaternion algebra Hg (lon/2n42, —5) is a split algebra over Q, for 
alln EN. 


(6) We study if the equation 
2trfruix’ — fry? = 2 (3.6) 
has rational solutions. From Proposition 2.1 x), we have that the pair (1, 1, f,) isa 
rational solution of the Eq.(3.6). Therefore, the quaternion algebra 
Ho (22 fn fn+1, — fon) is a split algebra over Q, for all n € N. 
(7) From Proposition 2.1, (xi), the equation 
fot’ “2 fay Sz (3.7) 
has the rational solution (1,1, f,), therefore the quaternion algebra 
Ho (fon, —2 fn fn—1) 18 a split algebra over Q, for alln € N. 
(8) From Proposition 2.1 (xii), the equation 
haihar + fy =e (3.8) 
has the rational solution (2,1,/,), therefore the quaternion algebra 
Ho Aer i) is a split algebra over Q, for all n € N. In the same way, 
the equation 
= frafisiy = 2 (3.9) 
has the rational solution (/,,2, f,), therefore the quaternion algebra 
Ho (1, — fn—1 fn41) is a split algebra over Q, for alln € N. 
(9) From Proposition 2.1 (xiii), the equation 


Fue” a y? = lise (3.10) 


has the rational solution (1, f,-1, fi+1), therefore the quaternion algebra 
Ho (fon, 1) is a split algebra over Q, for all n € N. 


Let H (a, 8) be the generalized real quaternion algebra with the basis {1, e2, e3, ea}, 
(fn)n>o be the Fibonacci sequence and (/;,),>0 be the Lucas sequence. We consider 
the quaternions 


Fn = fan t+ fn4ie2 + fn4303 + fn44ea 


and 
Ln = ln ied lh 11@2 + ln 13€3 + Ln +4€4, 


called Fibonacci quaternion and, respectively, Lucas quaternion. 
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Proposition 3.5 With the above notations, in H (a, B), the following relations are 
true: 


(1) 5n(F,) =n(L,),n EN. 
(2) S(fonzi + fonts) = lon + longa + longa + longo, n € N. 
(3) n (Fi, + Ln) _ n(F,) + n(Ln) + 2 (fon+7 = Son—-1) Ne N. 


Proof (1) From EE posnoe 2.1 (vii), we have 
n(Ly, )= = : - ae Tr Ina Tr [a= —_ 
=5f244(— 1" +5f2., rat yrs 
$5 fi +4(-D"? + 5f3.3+4(-D"*? = Sn (F,). 


(2) From Propose 2.1 (viii), we have 
n(L,) = + that = 
= Ion— 2 (—1)""" + lon y2— 2 (- 1"? + 
+onga— 2(—1)"? + lango— 2(-1)"4 = 
= lyn + Longe + longa + lonte- 
From Peposon 2.1 (ati), we have 
n(Fn) = for + fori + fei + fens = fansi + fonts and we apply the relation 
(1). 
(3) We have n (Fn +f,)= 


= (fa tl ny + (fra + Ini)? + fata + Inga)? + ints + ins)? = 
= Si + Stat Seva + Sas tO + la + las + gg 
+2 fala + 2 fn Uni + 2 fr+2ln+2 + 2 fro 3ln43 = 
= 0 (Fn) + MLn) + 2 (fan + Fanta + Panta + frnte) = 
=n(F,) + n(Ly) 4 2 Fa ce 17 ire ie Te ane teva i a)= 
=n(F,) +n(L,) +2 (— fon ie aay 


Fibonacci numbers are defined by using the addition operation over the real field. 
Will be interesting to search what is happen when, instead of addition, we use an 
arbitrary binary relation over R. Therefore, we will consider “x” a binary relation 
over R and a, b € R. We define the following sequence 


Pn = Yn-1 * Pn—2, Po = 4, Y| =D. 


We call this sequence the left Fibonacci type sequence attached to the binary 
relation “x”, generated by a and b. 
The following sequence 


Pn = Pn—-2 * Pn-1, 90 = 4,91 = b 


6699 


is called the right Fibonacci type sequence attached to the binary relation “x 
generated by a and b. 


Some Applications of Fibonacci and Lucas Numbers 127 


Proposition 3.6 Let A, B € R such that A= A? +4B > 0. We define on R the 
following binary relation 


x*xy=Ax+By,x,yeER. 


We consider (@n)nen the left Fibonacci type sequence attached to the binary 
relation “x*”. Therefore, we have 


1 


Oni = =a [(—b + ap) a"! + (b—aa) prt] (3.11) 
where a = AtVA B= 4/8 and 
£=lim @*! = max{a, 8). (3.12) 


Proof Taking gp = a, ¢| = b, we get 

¢2 = 91 * Go = Ab+ Ba, 

03 = 92* ¢ = A(Ab+ Ba)+ Bb= (A? + B) b+ ABa, etc. From here, g, can 
be write under the form 


Pn = Pn-1 * Pn-2 = Xnb + yn. 


We have 
ntl = Xnp1b + Yn+14 


and Gn41 = AGn + BOn-1 = A nb + yaa) + BXn-1b + Yn-14) = 
= (Ax, + Bx,_))b+ (Ay, + Byn-1) a. It results, 


Xn41 = AXy + BXn-1 


and 
Ya+l = Ayn + BYn-1. 


We consider the roots of the equation 
2 = 
x“ —Ax—-B=0, 


A+VA A-VA 
a= p= 5 


A=A74+4B. 
5 + 


Therefore, 
1 1 
xno = Xia"! + Xp"h,” 


You = Yio"! + yp", 
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with X, ieee: = 0, sea eee = land Y,;+ Y2 = 1, Yia+ Y28 = 0. We obtain 
xX;=- X2= 13373 = = 


a 
= =a" ia a? Ba 


1 
Xn41 = ra ae ob pets 


B 


and 


1 
Ya+l = ——__(Ba"*! + wp"), 
B-a 


Therefore, by straightforward calculations, we get 


Ont. = —= [(—b + ap) ttl (b — aa) e : 


B 


This formula is a Binet-type formula. 
Now, for the limit £, we obtain 


~ [(—b+aB) a"*'+ (b — aa) B"*"| 
N00 Gy 1-*00 ie b+ap) a"+ (b — aa) B"] 


=max{a, B}. 


Remark 3.7 (1) Similar results can be obtained for the right Fibonacci type 
sequences. 
(2) The above formula (3.11) is a Binet’s-type formula. Indeed, for A = B = 1,a = 
0, b = 1, we obtain the Binet’s formula for Fibonacci sequence. 
(3) The limit £, from relation (3.12) is a Golden-ratio type number. Indeed, for 
A= B=1,a=0,b =1, we obtain the Golden-ratio. 
We consider now the following difference equation of degree two 


dy, —= ady_| oe bdn-2, do =a, d = B, (3.13) 


with a, b, a, B arbitrary integers, n € N. It is clear that d) = aB + ba, d3 = ap + 
+ aba + bf, etc. 
Let (G, *) be an arbitrary group and go, g; € G. We define the following sequence 


d_ 
Pn = 92, * 9° 9, MW =GH*GH' GL =G * GR, (3.14) 


where d_, is defined depending on the sequence (dn)n»cn and g* =g*g*---* 
g, a—time, g € G. This sequence is called the left d-type sequence attached to the 
binary relation “x”, generated by go and qy. 

In a similar way, we define the right d-type sequence attached to the binary 
relation “x”, generated by go and gi, namely 


d_ 
On = 0? _p #071, M = 97 *95  GI=Gi* OR, (3.15) 
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where d_, is defined depending on the sequence (d,),»cnj and g? =g*g*...* 
g, a—time, g € G. 


Proposition 3.8 With the above notations, if g1 * go = go * gi, the following relation 
holds: 
dy— 
img woe (3.16) 


Proof By straightforward calculations, it results 


a b B a ° cL d_| e 
o2 =o * yb = (of « 95) * (of #95 ) = 
_ apt+ba aa+bd_; __ _ad\+bdp adj+bd_; __ dy dy 
=9 * Jo =%9 * Jo =9, *90- 


a b 
B=P«e= (oi? * gi) * (si * ai) — qe * ig Using induction, we get 


oe ee ee | ee ee ee Ce 
Pnt+1 = Pn * Pn-1 = \9G1 * Io *IL OG * Jo =%9 * IQ - 


Remark 3.9 (1) Fora=b=1,a=0, 6 = 1, we get the Fibonacci sequence and 
relation (3.14) becomes 


Pn = Pn—-2 * Pn-1, Po = Gas Yi = 91, 


since f_; = 1. In this case, relation (3.16) has the following form 


Gn = 91" * 9p": (3.17) 


Relation (3.17) was independently found in [6] Proposition 5.2, and it is a par- 
ticular case of relation (3.16). 

(2) Fora=b=1,a =2, 6 = 1, we get the Lucas sequence and relation (3.14) 
becomes 


On = Pn—-2 * Pn-1;, Po = g; ok os Yi=91 * a: 
In this case, relation (3.16) has the following form 


In-1 


In 
Gn = 91 * Io 


4 Conclusion 


In this paper, we provided new applications of Fibonacci and Lucas sequences. In 
Propositions 3.2 and 3.3, we obtained some algebraic structures defined by using 
Fibonacci and Lucas elements. In Proposition 3.5, we provided some new properties 
and relations of these sequences. 

In Propositions 3.6 and 3.8, we also generalized Fibonacci and Lucas numbers by 
using an arbitrary binary relation over the real fields instead of addition of the real 
numbers and we give properties of the new obtained sequences. 
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Moreover, taking in consideration some known relations between these numbers, 
in Proposition 3.4, we give a method to find new examples of quaternion split alge- 
bras. This method can be an easy alternative to using the properties of quadratic 
forms. 

As a further research, we intend to continue the study of such a sequences, espe- 
cially in their connections with quaternion algebras, expecting to obtain new and 
interesting results. 
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Some Remarks Regarding Finite ®) 
Bounded Commutative BCK-Algebras siete 
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Abstract In this chapter, starting from some results obtained in the papers Flaut, 
C., Vasile, R., Wajsberg algebras arising from binary block codes, and Flaut, C., 
HoSkova-Mayerova, S., Saeid, A., B., Vasile, R., Wajsberg algebras of ordern,n < 9, 
we provide some examples of finite bounded commutative BCK-algebras, using the 
Wajsberg algebra associated to a bounded commutative BCK-algebra. This method 
is an alternative to the Iseki’s construction, since by Iseki’s extension some properties 
of the obtained algebras are lost. 
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1 Introduction 


BCK-algebras were first introduced in mathematics in 1966 by Y. Imai and K. Iseki, 
through the paper [9]. These algebras were presented as a generalization of the 
concept of set-theoretic difference and propositional calculi. The class of BCK- 
algebras is a proper subclass of the class of BCI-algebras. 
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Definition 1 ({9, 13]) An algebra (X, «, 0) of type (2, 0) is called a BCI-algebra if 
the following conditions are fulfilled: 


C1) ((& * y) * (x * z)) * (z * y) = 9, forall x, y,z € X; 

(2) (x* (x * y)) * y = 90, forall x, y € X; 

(3) x *x = 90, forallx € X; 

(4) For all x, y,z € X such thatx * y= 6, y*x =, it results x = y. 
If a BCI-algebra X satisfies the following identity: 

(5) 06*x =8@, forall x € X, then X is called a BCK-algebra. 


Definition 2 ((9]) G) A BCK-algebra (X, *, 0) is called commutative if 
x* (Xe Y) = ye (y*X), 
for all x, y € X and implicative if 
X*(y *#x) = x, 


for allx, y € X. 
(ii) A BCK-algebra (X, *, @) is called positive implicative if and only if 


(x * y) * Z = (x * Z) * (y * 2), 


for all x, y,z € X [5]. 

The partial order relation on a BCK-algebra is defined such that x < y if and 
only ifx* y= 90. 

If in the BCK-algebra (X, «, 0) there is an element 1 such that x < 1, for all 
x € X, therefore the algebra X is called a bounded BCK-algebra. In a bounded 
BCK-algebra, we denote | * x = x. 

If in the bounded BCK-algebra X, an element x € X satisfies the relation 


=l| 
Il 
bai 


then the element x is called an involution. 

If (X, *, @) and (Y, o, @) are two BCK-algebras, a map f : X — Y with the prop- 
erty f (x*« y) = f (x)o f(y), for all x, y € X, is called a BCK-algebras mor- 
phism. If f is a bijective map, then f is an isomorphism of BCK-algebras. 


Definition 3 (1) Let (X, *, 6) be a BCK algebra and Y be a nonempty subset of X. 
Therefore, Y is called a subalgebra of the algebra (X, «, 9) if and only if for each 
x,y € Y,wehavex * y € Y. This implies that Y is closed to the binary multiplication 
"SR 

It is well known that each BCK-algebras of degree n + 1 contains a subalgebra 
of degree n. 
(2) Let (X, *, 0) be a BCK algebra and J be a nonempty subset of X. Therefore, 7 
is called an ideal of the algebra X if and only if for each x, y € X we have: 
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G) deT; 
(i) x*yel7andyeT/, thenx e/. 


Proposition 4 ({11]) Let (X, *, 0) be a BCK algebra and Y be a subalgebra of the 
algebra (X, *, 0). The following statements are true: 


(i) OEY; 
(ii) (Y, *, @) is also a BCK-algebra. 


Let X be a BCK-algebra, such that 1 ¢ X.On the set Y = X U {1}, we define the 


oe 


following multiplication “o”, as follows: 


x*y, ifx,yeX; 
6,ifxexX,y=1; 
1,ifx=landyex; 
6,ifx=y=l. 


xoy= 


The obtained algebra (Y, 0, @) is a bounded BCK-algebra and is obtained by the 
so-called Iseki’s extension. The algebra (Y, o, @) is called the algebra obtained from 
algebra (X, *, @) by Iseki’s extension ({11], Theorem 3.6). 


Remark 5 ({11]) 


(i) The Iseki’s extension of a positive implicative BCK-algebra is still a positive 
implicative BCK-algebra. 
(ii) The Iseki’s extension of a commutative BCK-algebra, in general, is not a 
commutative BCK-algebra. 
(iii) Let X be a BCK-algebra and Y its Iseki’s extension. Therefore X is ideal in Y. 
(iv) If J is an ideal of the BCK-algebra X, x € J and y < x, therefore y € J. See 
also [8]. 


In the following, we will give some examples of finite bounded commutative BCK- 
algebras. In the finite case, it is very useful to have many examples of such algebras. 
But, such examples, in general, are not so easy to found. A method for this purpose can 
be Iseki’s extension. But, from the above, we remark that the Iseki’s extension can’t 
be always used to obtain examples of finite commutative bounded BCK-algebras 
with given initial properties, since the commutativity, or other properties, can be 
lost. From this reason, we use other technique to provide examples of such algebras. 
We use the connections between finite commutative bounded BCK-algebras and 
Wajsberg algebras and the algorithm and examples given in the papers [6, 7]. 


2 Connections Between Finite Bounded Commutative 
BCK-Algebras and Wajsberg Algebras 


Definition 6 ({3]) An abelian monoid (X, 9, ®) is called MV-algebra if and only if 
we have an unary operation ‘““”’ such that: 
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(i) (’)' =x; 
(ii) x90’ = 0’: 
(iii) (x @ y)’ @y =(y' @x) @ x, forall.x, y € X [12]. Wedenote it by (xX, @,’, 4). 


We remark that in an MV-algebra the constant element 6’ is denoted with 1. This 
is equivalent with 
1=06'. 
With the above definitions, the following multiplications are also defined (see [12]): 
xOy = (x' ey)’, 


/ 


xOy=xOy' =(x'@y) 


From [4], Theorem 1.7.1, for a bounded commutative BCK-algebra (X, «, 0, 1), 
if we define 
x’ =1x*x, 


x@y=1x*((l*x)*y)= (x’ xy)’, x,y eX, 
we obtain that the algebra (x ,8,’, ) is an MV-algebra, with 
XOyHxXY. 


The converse is also true, that means if (x, @, 0,') is an MV-algebra, then 
(X, ©, 0, 1) is a bounded commutative BCK-algebra. 


Definition 7 ((4], Definition 4.2.1) An algebra (W, 0,” , 1) of type (2,1,0) is 
called a Wajsberg algebra (or W-algebra) if and only if the following conditions are 
fulfilled: 


Gi) lox=x; 
(i) (Wo y)o[(yoz)o(xoz)J = 1; 
(iii) (xo y)oy=(yox)ox; 
(iv) (Xo y)o (yox) = 1, for every x, y,z € W. 
Remark 8 ([{4], Lemma 4.2.2 and Theorem 4.2.5) 


(i) For (W, 0, , 1) a Wajsberg algebra, if we define the following multiplications 
xOy=(xoy) 


and 
x@®y=xXoy, 


for all x, y € W, we obtain that (W, @, ©, , 0, 1) is an MV-algebra. 
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(ii) Conversely, if (x ,0,0,',9, 1) is an MV-algebra, defining on X the operation 
xoy=x'®@y, 


we obtain that (x ,0,', 1) is a Wajsberg algebra. 


Remark 9 From the above, if (W,o, ,1) is a Wajsberg algebra, therefore 
(W, ®, ©, ,0, 1) is an MV-algebra, with 


xOy=(%@y)=(xXOYy). (2.1) 


Defining 
x*y=(xoy), (2.2) 


we have that (W, x, 0, 1) is a bounded commutative BCK-algebra. 


Using the above remark, starting from some known finite examples of Wajsberg 
algebras given in the papers [6, 7], we can obtain examples of finite commutative 
bounded BCK-algebras, using the following algorithm. 


3 The Algorithm 


(1) Let 7 be a natural number, n 4 0 and 
N=P{ro...0,,7; EN, 1 <r; <n,i € {1,2,..., t}, 


be the decomposition of the number n in factors. The decompositions with the 
same terms, but with other order of them in the product, will be counted one 
time. The number of all such decompositions will be denoted with z,,. 

There are only z, nonismorphic, as ordered sets, Wajsberg algebras with n 
elements. We obtain these algebras as a finite product of totally ordered Wajsberg 
algebras (see [7] Theorem 4.8 and [2]). 

(3) Using Remark 2.4 from above, to each Wajsberg algebra a commutative bounded 

BCK-algebras can be associated. 


@ 


Ym 


4 Examples of Finite Commutative Bounded 
BCK-Algebras 


In the following, we use some examples of Wajsberg algebras given in the paper 
[6]. To these algebras, we will associate the corresponding commutative bounded 
BCK-algebras and we give their subalgebras and ideals. 
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Example 10 Let W = {O < A < B < E} bea totally ordered set. On W we define 
the multiplication 0; as in the below table, such that (W, o,, E) is a Wajsberg algebra. 
We have A = Band B= A. 


ol/OABE 


O|\|E EEE 
AIBEEE. 
B\|ABEE 
E|OABE 


(See [6], Example 4.1.1.) 
Therefore, the associated commutative bounded BCK-algebras (W, *;, O) has 
multiplication given in the below table: 


The proper subalgebras of this algebra are: {O, A}, {O, B}, {O, E}, {O, A, B}. 
There are no proper ideals in the algebra (W, *;, O). 


Example 11 Let W = {O < A < B < E} bea totally ordered set. On W we define 
the multiplication ©, as in the below table, such that (W, 02, EF) is a Wajsberg algebra. 


O|\|E EEE 
AIBEBE. 
B|AAEE 
E|OABE 


Therefore, the associated commutative bounded BCK-algebras (W, *2, O) has 
multiplication given below: 


The proper subalgebras of this algebra are: {O, A}, {O, B}, {O, E}, {O, A, B}. The 
proper ideals are: {O, A}, {O, B}. 


Example 12 Let W={O < A< B<C < D < E} be a totally ordered set. On 
W we define a multiplication o3 given in the below table, such that (W, 03, E) isa 
Wajsberg algebra. We have A= D, B=C,C = B, D=A. 
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OABCDE 
EEEEEE 
DEEEEE 
CDEEEE. 
BCDEEE 
ABCDEE 
OABCDE 


moaweos 


Therefore, the associated commutative bounded BCK-algebras (W, *3, O) has mul- 
tiplication defined below: 


3/0 ABCDE 
O|O0O0000 
A|A OOOOO 
B\|BAOOOO 
C|\C BAOOO 
D|DC BAOO 
E|IEDCBAO 


The proper subalgebras of this algebra are: {O, A}, {O, B}, {O, C}, {O, D}, {O, E}, 
{O, A, B}, {O, B, D}, {O, A, B, C}, {O, A, B,C, D}. 
This algebra has no proper ideals. 


Example 13 Let W={O < A<B<C<D < E} bea totally ordered set. On 
W we define a multiplication o, given in the below table, such that (W, o4, E) isa 
Wajsberg algebra. We have 

OABCDE 

EEEEEE 

DEEDEE 

CDECDE. 

BBBEEE 

ABBDEE 

OABCDE 


moaraweoe? 


Therefore, the associated commutative bounded BCK-algebras (W, «4, O) has mul- 
tiplication given in the following table: 


*x4|0 A BC DE 
O|OO0O0000 
A|A OOAOO 
BIBAOBAO. 
C|ICCCOOO 
D|DCCAOO 
E|EDCBAO 
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The proper subalgebras of this algebra are: {O, A}, {O, B}, {O, C}, {O, D}, {O, E}, 
{O, A, B}, {0O, A, B, C}, {O, A, B, C, D}, {O, A, C}, {O, A, C, D}. 
The proper ideals of this algebra are: {O, A, B}, {O, C}. 


Example 14 Let W={O < A<B<C<D< E} bea totally ordered set. On 
W we define a multiplication os; given in the below table, such that (W, os, E) isa 
Wajsberg algebra. We have 
OABCDE 
EEEEEE 
CEADDE 
DEEDDE. 
AEAEEE 
BABAEE 
OABCDE 


moaweose 


Therefore, the associated commutative bounded BCK-algebras (W, «5, O) has mul- 
tiplication defined in the following table: 


*5|O A BCDE 
O|O0O0000 
AIA OCBBO 
BIBOOBBO. 
C|\C OC OOO 
D|DC DC OO 
E|IECDABO 


The proper subalgebras of this algebra are: {O, A}, {O, B}, {O, C}, {O, D}, 
{O, EF}, {O, B, C}, {O, C, D}, {O, A, B, C}, {O, A, B,C, D}. 
All proper ideals are: {O0, C, D}, {O, B}. 


Example 15 LettW={O< X<Y<Z<T<U<V < E}beatotally ordered 
set. On W we define a multiplication og which can be found in the below table. 
We obtain that (W, og, E) is a Wajsberg algebra.We have X = V,Y =U,Z=T. 
Therefore the algebra W has the following multiplication table: 


OXYZTUVE 
EEEEEEEE 
VEEEEEEE 
UVEEEEEE 
TUVEEEEE. 
ZTUVEEEE 
YZTUVEEE 
XYZTUVEE 
OXYZTUVE 


mM CHUN<&OlS 
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From here, we get thet the associated commutative bounded BCK-algebras (W, 6, O) 
has multiplication given in the below table: 


*|O XY ZTUVE 
O|O0O000000 
xX|X OOOOOOO 
Y|\Y xXxOOOOOO 
Z|IZY XOOOOO. 
T\|T ZY x OOOO 
U|JUTZY xOOO 
VIVUTZYxOO 
E\EVUTZTxXxO 


The proper subalgebras of this algebra are: {O, J}, J € {X, Y, Z, T, U, V, E}, 
{O, X,Y}, {O, X, Y, Z}, {O, X, Y, Z,T}, {O, X, Y, Z, T, U}, {O, X, Y, Z, T, 
U, V}. There are no proper ideals. 


Example 16 LettW={O< X<Y<Z<T<U<V < E}beatotally ordered 
set.On W we define a multiplication o7 given in the below table, such that (W, 07, E) 
is a Wajsberg algebra. 
OXYZTUVE 
EEEEEEEE 
VEVEVEVE 
UUEEEEEE 
TUVEVEVE. 
ZZUUEEEE 
YZTUTEVE 
XXZZUUEE 
OXYZTUVE 


mI QHN=x~x ole 


Therefore, the associated commutative bounded BCK-algebras (W, *7, O) has mul- 
tiplication defined in the below table: 


OXYZTUVE 
OO0O00000O 


mT QHN OF 
mT CHNN& 


SSS SSS 
GNNS&<O 
SAS AO On@ 
NSWNO ¥ OX 
<NWOO000 
SOX OM OX 
SeeoQo0090 
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The proper subalgebras of this algebra are: {O, J}, J € {X, Y, Z, T,U, V, E}, 


{O, 
{O, 


X,Y}, {O, X, Y, Z},{O,T, Y},{O,T, Y, V},{O, X, Y, Z,T}, {O, X, Y, Z, T, U}, 
¥.Y,Z, 7.0. V}. 


All proper ideals are: {O, Y, T, V}, {O, X}. 


5 


Conclusions 


In this chapter, we provided an algorithm for finding examples of finite commuta- 
tive bounded BCK-algebras, using their connections with Wajsberg algebras. This 
algorithm allows us to find such examples no matter the order of the algebra. This 
thing is very useful, since examples of such algebras are very rarely encountered in 
the specialty books. 
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Revisiting Fixed Point Results with a ®) 
Contractive Iterative at a Point cena! 


Erdal Karapinar 


Abstract In this manuscript, we shall discuss fixed point results with a contractive 
iterative at a point in the setting of various abstract space. The first aim of this paper 
is to collect the corresponding basic results on the topic in the literature. After then, 
our purpose is to combine and connect several existing results in this direction by 
generalizing the famous theorem of Matkowski. We shall consider some consequence 
of our main result to illustrate its genuineness. 


Keywords Fixed point theorems - Metric space - Contraction mapping - 
Contractive mapping - Contractive iterative at a point 


2000 Mathematics Subject Classification: 47H10 -54H10 


1 Introduction and Preliminaries 


It is generally accepted that the metric fixed point theory was initiated by Banach 
[8] in 1922. In most of the sources, it has been noted that the pioneer results in 
metric fixed point theory was the renowned Banach contraction mapping principle 
(or, contraction mapping principle) which says that each contraction in a complete 
metric spaces possesses a unique fixed point. This general and accepted opinion is 
not wrong but at the same time that’s not entirely true. In a few sources, it is clarified 
that the given statement above, contraction mapping principle in a complete metric 
space, is belong to Caccioppoli [15], not to Banach. Indeed, Banach proved that 
every contraction has a unique fixed point in the context of complete norm spaces. 
In fact, this is very natural and reasonable since almost the same decade, the notion 
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of metric space had been formulated Frechét [23] under the name of L-space and it 
has been called as a “metric space” after Hausdorff [25]. 

We underline also that the idea of Banach [8] is a perfect abstraction and formu- 
lation of the method of the successive approximation that has been used to solve 
the existence and uniqueness of certain differential equations with an initial value 
problem. When it is considered from this point of view we can extend the root of 
metric fixed point theory to significant results of famous researchers, e.g. Liouville 
[35, 36], Cauchy [16], Lipschitz [37], Peano [40] and Picard [41]. As a historical 
remark, this is the explanation of why in some sources Banach’s fixed point theorem 
(the contraction mapping principle) was announced as Banach-Picard theorem or 
Banach-Caccioppoli theorem. 

When we get back to the main subject and recap the topic, the most significant 
results in metric fixed point theory was reported by Banach [8] in the setting of 
complete norm space and by Caccioppoli [15] in the framework of metric space. 
Since then, this idea has been extended by changing the structure of the abstract 
space, or it has been generalized by changing the contraction or it is improved by 
applying both approaches. 

In this note we restrict ourselves to investigate the existing and uniqueness of a 
fixed point of the operators that satisfy distinct contraction condition. More precisely 
and apparently, we examine fixed point results for mappings with a contractive iterate 
at a point. In what follows we recollect the basic results in this direction without the 
proof. Indeed, the techniques of the proof shall be used in our main result and to 
avoid repetition we skip the proofs of the famous results in the literature. 

For our purpose, we shall start to collect the basic results with the formal statement 
of Banach’s fixed point theorem. 


Definition 1 ({23, 25]) Let X be a nonempty set. A function d : X x X — [0, co) 
is called metric if it fulfills the following conditions: 


(d\) d(x, x) = 0; 

(da) d(x, y)=d(y,x) =0=5x=Yy, 
(d3) d(x, y) = dy, x); 

(dq) d(x,z) < d(x, y)+d(y,z); 


for every x, y, z € X. In addition, the pair (X, d) is called metric space. 


In some sources, single letter X is used as metric space by implicitly considering 
the pair to (X, d). 

For a non-self-mapping 7’, defined on a metric space (X, d), is called Lipschitzian 
if there exists a constant k > O such that 


d(Tx, Ty) <kd(x,y), forallx,yeX. (1) 


The smallest constant k which satisfies (1) is called Lipschitz constant. We under- 
line that a Lipschitzian map is necessarily continuous. If k < 1, then the mapping T 
is called contraction. 

Now, we are ready to give formal statement of Banach-Caccioppoli theorem: 
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Theorem 1 (Banach-Caccioppoli Theorem [8, 15]) Let (X, d) be a complete metric 
space and let T : X — X be a self-mapping. If there exists a constant k in [0, 1) 
such that 

d(Tx,Ty) <kd(x,y), (2) 


for any pair of points x, y € X, then T has a unique fixed point x*. Moreover, the 
iterative sequence {T"xo}, for any x9 € X, converges to x*. Furthermore, for all 
n EN, we have the following estimation: 


n 


1-—k 


d(x", T"xo) < d(xo, T Xo), (3) 


for any xp € X. 


As we already mentioned that the contraction mapping is necessarily continuous. 
However, the continuity requirement is a heavy condition and most of the functions 
derived from the real world problems are not continuous. A very natural question 
rises from this observation that is there a discontinuous mapping that possesses a fixed 
point. This question was affirmatively responded by Bryant [13] with the following 
simple but significant example: 

Example 1 ([13]) Let T : [0,2] — [0, 2] be defined by T(x) = li 7 : t 4 
Observe that T is not continuous at | and hence, it is not a contraction. On the other 
hand, 2nd iteration of T is equal to 0 for all x € [0, 2]. Consequently, the 2nd iteration 
of T, namely T*, forms a contraction on [0, 2] with a fixed point 0. 


Henceforwards, the letters Rt and N denotes the real numbers and positive inte- 
gers, respectively. In addition, we preserve the letters IR’, and No to represent the set 
of non-negative real numbers and the set of non-negative integer numbers, respec- 
tively. For a non-selfmapping T on a metric space (X, d), the expression T” denotes 
the p-th iterate of T where p € N. 


1.1 Results in the Setting of Complete Metric Spaces 


In the light of the observations of the example above, Bryant proved a relaxed version 
of the Banach-Caccioppoli theorem as follows: 


Theorem 2 (Bryant Theorem [13]) Let T be a self mapping on the complete metric 
space (M, d) andm €N. If there exists k € [0, 1) such that 


d(T™v, T”w) < kd(x, y), (4) 


for all x, y € M then there exists exactly one fixed point of T. 
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Proof On account of Theorem 1, we conclude that m-th iteration of T, i.e. T” 
possesses a unique fixed point, say x* € X. To finalize the proof, it is sufficient to 
use the following elementary equality: 


Tx* = T(T"x*) = T" (Tx*), 


It yields that T x* is also fixed point of T”. Regarding the uniqueness, we necessarily 
have Tx* = x*. 


We emphasize that the condition (4) does not force T be continuous but its m-th 
iteration. 

The result of Bryant [13] is one of the initial extensions of the contraction mapping 
principle. Bryant’s theorem can be also considered as an initiation of a new research 
trend: Investigation of fixed point of mapping that satisfies distinct contractive type 
condition where the iteration depends on the point in space. The first extension of 
Bryant’s theorem was proved by Seghal [45] in 1969. 


Theorem 3 (Sehgal Theorem, [45]) Let (X, d) be a complete metric space, and T : 
X — X be acontinuous mapping satisfying the condition: there exists a k € [0, 1) 
such that for each x € X, there is a positive integer n(x) such that for all y € X 


dF), TO). < hd Go) (5) 


Then T has a unique fixed point z and T" (xo) — z for each xo € X. 


We say that a mapping T : X — X is with a contractive iterate at a point x if it 
satisfies (5). 
1 1 
Example 2 ((45]) Let A, = aa? aad for each n € N and X = [0, 1] = {0} U 
ts A, equipped with the usual metric d(x, y) := |x — y| for all x, y € X. For 
each n € N, we define T : A, — An+ 1 as 


0 ifx =0; 


n+2 1 1: 3n+5 1 q 
nh3 (x =) oF mn ifx € [Ass | , 


T(x) = 


1 ‘ 1 3n+5 
n+l if x = Es ’ wis | 


Here T is a continuous, non-decreasing function on X and O is the unique fixed 
point of 7. Indeed, by letting k = 5, for each x € A,, the required integer n(x) can 
be chosen as n + 3 where n(0) > 1. Notice that T is not a contraction. Moreover, 
the condition (5) is weaker then the corresponding condition (4). For more detail, we 
refer to [45]. 
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In 1970, Guseman [24] refined the interesting result of Sehgal [45] by removing 
the “continuity” assumption. Notice that “continuity” was not presumed in the result 
of Bryant [13]. 


Theorem 4 (Guseman Theorem, [24]) Let (X, d) be a complete metric space and 
let T : X — X bea mapping. Suppose there exists B C X such that 


(a) T(B) CB, 
(b) for some k < 1 andeach y C B there is an integer n(y) > 1 with 


d(T" (x), T" (y)) < kd(x, y) for all x C B, 


(c) for some xo C B, cH{T" (xo): n = 1} C B. 


Then there is a unique u € B such that f(u) =u and T"(yo) > u for each 
yo € B. Furthermore, if d(T" (x), T"™ (u)) < kd(x, u) for allx € X, thenu € X 
is unique fixed point and T" (x9) — u for each xo € X. 


Theorem 5 (Guseman Uniqueness Theorem, [24]) Let (X,d) be a metric space, 
f :X — X amapping, and u, xo € X with T" (xo) —> u. If T is with a contractive 
at a point u, then T has a unique fixed point u € X and T"(y) — u for eachy € X. 


Theorem 6 (Matkowski Theorem, [38]) Let (X, d) be a complete metric space, T : 
X > X anda: [0,o) > [0, 00) with y(t) = a(t, t, t, 2t, 2t) for t > 0. Suppose 
that: 


(i) a is nondecreasing with respect to each variable, 
(ii) lim (t — 7(1)) = 0%, 
too 
(iii) lim y"(t) =0,t > 0, 
noo 
(iv) for every x € X, there exists a positive integer n(x) such that for all y € X, 


d(T” (x), T?(y)) < PRG, y), (6) 
where 
PEO y) = a (d(x, PC), dx, TP"), dC, 9), d(T" (2), 9), d(T" (9), 9)) 
Then T has a unique fixed point x* and T" (x) — x* for each x € X. 


Lemma 1 ((38]) Suppose that w : (0,00) — [0, co) is non-decreasing function 
such that lim w"(t) = 0 for t > 0. Then, we have y(t) < t. 
noo 


Another generalization of on this trend was given by Ray and Rhoades [43] in 
1975 by considering common fixed point approach. 


Theorem 7 (Ray and Rhoades [43]) Let T, S be self-mappings of a complete metric 
space (X,d) such that there exists a constant k, 0 <k <1 such that there exist 
positive integers n(x), m(y), such that for each x,y € X, 
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d(x, S™) y) + d(y, T?™) x) 


d(T?) x, sid) y) < kmax {acs y), d(x, TPO) x), d(y, sm) y), > 


(7) 


Then T and S have a unique common fixed point. 
In 1979, Singh [46] proposed the following generalization: 


Theorem 8 (Singh [46]) Let (X, d) be a complete metric space and T : X — X be 
a mapping such that for each x € X, there exists an integer n(x) > 1 such that for 
ally eX 


d(T") (x), T?™) (y)) < q(x, y)d(x, y) + r(x, y)d(x, T?™ (x)) + s(x, y)d(y, T? (y)) 
+t(x, y)d(x, T° (y)) + m(x, y)d(y, T?™ (x)) 

(8) 
where q(x, y), r(x, y), S(x, y), t(x, y), andm(x, y) are nonnegative functions such 
that sup, yex {2(m(x, y) + 8(x, y) +(x, y)) Fa, y tre, y))} =A <1. Then 
T has a unique fixed point z and T" (x9) — u for each xo € X. 


In 1983, Cirié [18] proved the following significant theorem: 
Theorem 9 (Cirié [18]) Let (X,d) be a complete metric space, k € [0, 1) and 
fi: X — X. If for each x € X there exists a positive integer n = n(x) such that 
d(T"x, T"y) <k- max {d(x, y), d(x, Ty), ..., d(x, T"y),d(x,T"x)} (9) 
holds for all y € X, then T has a unique fixed point z € X. Moreover, for every 
xEX,z=limyseIT”x. 


In 1990, Kincses and Totik [34] proposed the following results in the setting of 
compact metric space for continuous mapping: 


Theorem 10 (Kincses and Totik [34], Theorem 3) Let X be compact, T continuous. 
If for each x € X there is a natural number p(x) such that for every y € X, y Ax 
we have 


d(T" (x), T”™ (y)) < max {d(x, T™ (y)), d(y, T"™ (x), d(Qy, x), } (10) 


then T has a fixed point z and T” (x9) — z for every xo € X. 


In the same paper, Kincses and Totik [34] observed also the following two results 
in the setting of complete metric space. 


Theorem 11 (Kincses and Totik [34], Theorem 4) Let T be a self-mapping on a 
complete metric space(X, d). Ifa < 1 and to every x € X there is anatural number 
P(x) such that 

d(T" (x), T° (y)) < max {d(x, T" (y)), dy, T™ (a), dQ, x), } AD 


holds for every y € X T has a fixed point z and T" (x9) — z for every xq € X. 
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Theorem 12 (Kincses and Totik [34], Theorem 6) Let T be a self-mapping on a 
complete metric space(X, d). Suppose that there is a monotone decreasing function 
a: (0,00) — [0, 1) and for each x € X an integer n(x) such that for all y € X 
y #x we have 
AT Gy), TG) = oO, ay) (12) 
Then T has a fixed point z and T" (xo) > z for every xo € X. 


In 1995, Jachymski [26] gave the latest result in the setting of a complete metric 
space: 


Theorem 13 (Jachymski [26]) Let a complete metric space (X,d), a function T : 
X — X and p: X > N such that for all x, y € X 


d(T? x, TP y) < [max (d(x, T/y), d(T? x, T/y), j =0,1,2,...p@)}], 

(13) 
where w is anondecreasing function such that y(t) < t forallt > Oand lim;-.0(t — 
ab(t)) = 00. Then T has a contractive fixed point x”. 


Corollary 1 (Jachymski [26]) Let T be a selfmap of a complete metric space (X, d) 
and let p : X — N such that for all x, y € X 


d(TPM x, TP y) < 1, y), (14) 


where 
I(x, y) = p[max {d(x, T?™ (y)), dQ, T? (x)), dy, x) }] 


and w is a nondecreasing function such that w(t) < t for allt > 0 and lim,_,..(t — 
w(t)) = oo. Then T has a contractive fixed point x*. 


Corollary 2 (Jachymski [26]) Let T be a selfmap of a complete metric space (X, d) 
and let p : X — N such that for all x, y € X 


ares, TO) ST), (15) 
where a,b,c € Ry and4a+2b+c < land 
I(x, y) = ald(x, T?™ (x) + dy, T?™ (yy)] + bld@x, TP (y)) + dy, TP (x) + cdQy, x). 


Then T has a contractive fixed point x*. 


1.2 Results in the Setting of Complete b-Metric Spaces 


In this section, we shall consider the extension of the existing common fixed point 
results in a more general abstract structure, namely, b-metric space. Indeed, the 
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concept of b-metric, is very natural that is why it has appeared in several papers, 
such as Bourbaki [9], Bakhtin [10], Czerwik [19], Heinonen [21] and many others. 
In brief, the b-metric was obtained by substituting the triangle inequality of the metric 


(T) d(x, y) < d(x, z) + d(z, y), for every x, y,z € X, 
with the inequality: 
(S) d(x, y) < sld@, z) +d, y)], for every x, y,z € X, 


for a fixed s > 1. The inequality in (S) can be called s-weighted triangle inequality. 
In this case, the tripled (X, d, s) is termed a b-metric space. It is clear that (X, d,s) 
forms a standard metric space in the case of s = 1. 


Definition 2 By a comparison function we mean a function vy : [0, 00) > [0, co) 
with the following properties: 


(cfi) ¢ is increasing; 
(cf) lim y"(t) = 0, for t € [0, 00). 
n—-> Oo 


We denote by © the class of the comparison function vy : [0, 00) — [0, o<). 
Next we list some basic properties of the comparison functions. 
Proposition 1 ({44]) Jf y is a comparison function then: 


(cfi) each yk is a comparison function, for all k € N; 
(cfii) y is continuous at 0; 
(cfiii) p(t) <tforallt > 0. 


Definition 3 ({44]) A function y, : [0, 00) — [0, 00) is called a c-comparison func- 
tion if: 


(ccf\) Ye is Monotone increasing; 
oo 


(ccf2) * p(t) < 00, for all t € (0, oo). 
n=0 


We denote by ®, the family of c-comparison functions. 
It can be shown that every c-comparison function is a comparison function. 


Definition 4 ([44]) A function vy : [0, 00) — [0, oo) is called a b-comparison func- 
tion if: 


(bcf 1) vy is monotone increasing; 
(bcf 2) saan s"p"(v) < 00, for all u € (0, 00) and s > 1 areal number. 


We denote by ®, the family of b-comparison functions. 


Note that any b-comparison function is a comparison function. 
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Theorem 14 (Bota[14]) Let (X, d, s) be acomplete b-metric space with s > 1 and 
T : X — X amapping which satisfies the condition: there exists ana € (0, 1) such 
that for each x € X there is a positive integer n(x) such that for all y € X 


d(T") (x), T"(y)) < ad(x, y). ae) 


Then: 


(i) T has aunique fixed point z € X and T" (x9) > zfor each xo € X,asn > ov. 
If, in addition, the b-metric is continuous we have: 
(ii) d(xo, z) < sd(xo, T"™ (x0)) + Tag (X0), for each xo € X. 


(iii) Let g: X — X such that there exists yn > 0 with 


d(T" (x), g(x) < m; for any x € X. 


Then 


d(z,u)<s-nt+ -r(u) 


l—sa 
for allu € Fix(g). 


Definition 5 ([6]) Let T, S be self-mappings on a b-metric space (X, d, 5). We say 
that T, S forms A-contraction if there existsc € (0, x4) such that foreach p,q € X, 
there exist positive integers n(p), m(q) such that: 


d(T" p, 8" gq) < cE(p,q), (17) 


where: 
E(p, q) = [d(p, q) + |d(p, T” p) — dq, S”g)|] . 


Theorem 15 Jf T, S form A-contraction on a complete b-metric space (X, d,s), 
then T and S possess a unique common fixed point. 


Theorem 16 ([28]) Let T, S be two self-mappings on a complete b-MS (X, d,s). 
Suppose that for any x, y € X there exist positive integers p(x), q(y), and that there 
exist ~) € YW and an upper semicontinuous pp € ®, such that 


A(TPO x, S4(¥)) <o (max {ae y), d(x, TP x), d(y, SIM y), dy, aes $19) y) 


+0) (min {acx, TPx), d(y, 51 y), dy, TPx), dex, $1 y)}) 
(18) 


Then the pair of the functions T, S has exactly one fixed point x*. 
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1.3 Results in the Setting of Complete Dislocated b-Metric 
Spaces 


Definition 6 Suppose that X is not empty. A dislocated metric is a function 6: 
X x X — [0, oo) such that for all x, y,z € X: 


(61) 6%, y)=O0Sx=y, 
(62) d(x, y) = d(y, x), 
(63) d(x, y) < 6(x%, z) + 6(z, y). 


The pair of the letters (X, 5) represent a dislocated metric space, in short DMS. 
In what follows, we shall consider the unification of the above-mentioned notions: 


Definition 7 Suppose that X is not empty and s > 1 is given. A dislocated b-metric 
is a function dg : X x X — [0, co) such that for all x, y,z € X: 


(6b1) da(x, y) =O=>x=y, 
(0b2) da(x, y) = day, x), 
(6b3) da(x, y) = s[da%, z) + daz, y)I]. 


The pair (X, dz, 5) is said to be a dislocated b-metric space, in short b-DMS. 


Example 3 ({28]) Let X = Rj and 6, : X x X — [0, 00) defined by 6y(x, y) = 
|x — yl? + max {x, y}. Then, X with dg is a dislocated b-metric space with s = 2. 


It is obvious that b-metric spaces are b-DMS, but conversely this is not true. 
The topology of dislocated b-metric space (X, da, 5) was generated by the family 
of open balls 


B(x,r) ={y € X: |dg(x, y) — ba(x, x)| <r}, forall x € X andr > 0. 
Onab-DMS (X, 6g, 5), asequence {x,} in X is called convergent to a point x € X 
if the limit 
lim dg (tn, x) = a(x, x) (19) 
n—>Ooo 
exists and is finite. In addition, if the following limit 


lim ba (Xn, Xm) 
noo 


exists and is finite we say that the sequence {x,} is Cauchy. Moreover, if limy_,oo dg 
(Xn, Xm) = 0, then we say that {x,} is a 0-Cauchy sequence. 

Definition 8 The b-DMS (X, 6,, 5) is complete if for each Cauchy sequence {x,} 
in X, there is some x € X such that 


L= lim 6g(%,x) = 6a(x,x) = lim dbg(%n, Xm). (20) 
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Moreover, a b-DMS (X, 6a, 5) is said to be 0-complete if for each 0-Cauchy 
sequence {x,} converges to a point x € X so that L = 0 in (2). 

Let (X, dy, 5) be ab-DMS. A mapping f : X > X is continuous if {fx,} con- 
verges to fx for any sequence {x,} in X converges to x € X. 


Proposition 2 [2] Let (X, 6a, 5) be a b-DMS and {x,} be a sequence in X such that 
limy—+oo 0g(Xn, X) = 0. Then, 

(i) x is unique; 

(ii) 16y(x, y) < lim) +00 ba(Xn, y) = sdg(x, y), for all ye X. 

Proposition 3 [2] Let (X, 6g, 5) be ab-DMS. For any x, y € X, 


(i) if da(x, y) = 0 then dg(x, x) = daly, y) = 0; 
(ii) ifx # y then da(x, y) > 0; 
(iii) if {x} is a sequence in X such that limy oo b¢(Xn, Xn41) = O, then 


lim 6g(%n,%n) = lim 6g(%n41, X41) = 0. 
noo noo 
Theorem 17 Let (X, 6g, 8) be a0-complete b-DMS and T, S : (X, 5g, 5) > (X, 6a, 


5) be two functions. Let the function (p € ®». Suppose that for all x, y € X we can 
find the positive integers p(x), q(y) such that 


, ; 51 (y, TPO) ay) y 
By(TPs, $109) = (max | Bal 9), Bales PPO), bu, S19), “al ete 2) 


(21) 
Then the pair of the functions T, S has exactly one fixed point x”. 


From now on, we denote by W the collection of all c-comparison functions @% : 
[0, co) — [0, oo) that satisfy the following condition 


(ccfs) lim (x — (x) = 00. 


In the following we recall the concept of a-admissible mappings and triangular 
a-orbital admissible [42]: 


Definition 9 ([42]) Let T: X — X anda: X x X —> [0, ©). We say that T is 
an a-orbital admissible mapping if for all x € X we have 


(O) a(x, Tx) > 15 a(Tx, Tx) > 1. 


Every a-admissible mapping is an a-orbital admissible mapping, for more details 
on admissible mapping, see e.g. [1, 3-5, 7]. 


Definition 10 ([42]) Let a: X x X — [0, co). An a-orbital admissible function 
T : X —> X is said to be triangular a-orbital admissible if it satisfies 


(TO) a(x, y) => land a(y, Ty) => 1 implies that a(x, Ty) > 1, forallx, y € X. 
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At the end of this section, we present two further concepts that will be essential 
in our next considerations. 

A set X is regular with respect to mapping a: X x X — [0, oo) if the following 
condition is satisfied: 


(R) for any sequence {x,} in X such that a(x,, Xn41) => 1 for alln andx, > x € X 
asn — oo we have a(x, x,) > 1, for all n. 


Amapa: X x X — [0, 00) is said to satisfy the condition (VU) if 


(U) for any fixed point x of T”® we have a(x, y) > 1 for any y € X, where m(x) 
is a positive integer. 


Theorem 18 Let (X, 6) be a complete DMS, a function T : X > X, w € V and 
a:X x X — [0, &). Suppose that for all x € X we can find a positive integer 
m(x) such that for any w € X 

a(x, w)d(T"™x, T"™™w) < M(x, y), (22) 


where 


2 3 


- m(x) m(x) m(x) m(x) 
eee (ns {s w), 3 x) Sw, Tw) Sew, TM x) + 6x, T™ Mw) \ 


Suppose also that: 


(i) T is triangular a-orbital admissible; 
(ii) there exists vo in X such that a(xo, Tx9) > 1; 
(iii) either T is continuous, or 
(iv) the X space is regular and a satisfies the condition (U). 


Then the function T has exactly one fixed point. 


1.4 Branciari b-Metric Space 


Definition 11 ({11, 12]) Let X be a non-empty set and d: X x X — [0, 00) bea 
mapping such that for all x, y € X and for all distinct points u, v € X, each of them 
different from x and y, one has 


(i) d(x, y) = Oif and only if x = y, 
(ii) d(x, y) =dQy,x), 
(iii) d(x, y) < d(x,u) + dtu, v) + d(v, y) (quadrilateral inequality). 


Then (X, d) is called a Branciari distance space (or BDS). 


Branciari distance space has been named differently in the literature, e.g. general- 
ized metric space, rectangular metric space, Branciari metric space and so on, see e.g. 
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[17, 20, 29-33] and the corresponding references therein. Recently [47] indicated 
that the ordinary (standard) metric and the Branciari distance function d, defined in 
Definition 11, are not compatible. Accordingly, we conclude that Branciari distance 
function is not a generalization of ordinary metric. More precisely, it is not possible 
to find a proper condition that reduces the Branciari distance function, d defined in 
Definition 11, to a metric. Indeed, this observation lead us to use “Branciari distance” 
instead of “Branciari metric” or “generalized metric”. 

Let {x,} be a sequence in a Branciari distance space (X,d). We say that {x,} 
converges to some x € X if d(x,, x) — 0 as n — oo. The sequence {x,} is called 
Cauchy if d(xn, Xm) > 0 as n,m — oo. Moreover, if every Cauchy sequence in X 
is convergent in X, then (X, d) is called a complete Branciari distance space. 

A self-mapping T on Branciari distance space (X, d) is continuous if for each 
sequence {x,} in X such that d(x,,x) — 0asn — oo yields that d(Tx,, Tx) > 0 
asn — OOo. 


Remark 1 1. the topologies, induced by metric and Branciari distance, are incom- 
patible; 

. Branciari distance function is not necessarily continuous; 

. the topology of Branciari distance space needs not to be Haussdorf; 

. an open ball in the setting of BDS is not necessarily an open set; 

. aconvergent sequence in BDS needs not to yield Cauchy property. 
For more details, see e.g. [17, 20, 29-33] 


nb wWhd 


Example 4 ((22])Let X = AU Bwhere A = {a, b, c}and B = {+, ne N}. Define 
d:X x X — [0, oo) satisfying d(x, y) = d(y, x) as follows 


y, ifx Ee {b,c},yeEB 
d(x,y)= 70, ifx=y 


1, otherwise 


We observe that, for example, 


(9 se 2 eat Se) 
= ’ > = ’ ree i 
2 4 2 4 2 4 


which shows that d(x, y) is not a metric space, but it is easy to see that (X,d) isa 
BDS. Plus, we can observe that: 


1 1 1 
lim d (;.») = lim a(~.e) = lim —-=0, 
noo n noo n n>oon 


1 
hence the sequence = converges to both b and c. 
n 


The following lemma will be needed in the sequel. 
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Lemma 2 ((30]) Let {x,} be a Cauchy sequence in a Branciari distance space 
(X, d) such that lim d(xn,x) = 0, where x € X. Then lim d(xn, y) = d(x, y) for 
n—->Co noo 


all y € X. In particular, the sequence {x,} does not converge to y if y # x. 


Theorem 19 (Mitrovic¢ [39]) Let (X, d,s) be a complete Branciari b-metric space 
and T : X — X a mapping satisfying the condition: for each x € X there exists 
p(x) € N such that 

dT (y), FOO xy = hd (yx) (23) 


for all y € X where k € (0,1). Then T has a unique fixed point, say z € X and 
T"x — zforeachx € X. 


2 Extension of Matkowski Theorem 


The main aim of this paper is to observe the analog of the results of Matkowski [38] 
in the context of a complete b-metric space. 


Theorem 20 Let (X, d,s) be acomplete b-metric space with s > | andy be an aux- 
iliary functions that satisfies (ccf 1), (ccf 2) and (ccf 3). Suppose that a : [0, 00)? > 
[0, oo) be a mapping defined by y(t) = a(t, t,t, 2st, 2st) for t > 0 and it is nonde- 
creasing with respect to each component. If a self-mapping T : X — X satisfies the 
condition: 


| ; 1 
d(T" (x), T"(y)) < —TH(, y), ” 
S 


where 
PEC, y) = 0 (dx, 77 (x), dw, TP), dH, y), AT" 2), y), d(T? 9), y)) 


then T has a unique fixed point x* and T"(x) — x* for eachx € X. 


Proof The proof consists of several steps. In the first step, we shall indicate that for 
each x € X, the corresponding orbit {T'x} is bounded. For an arbitrary x € X, we 
fix a positive integer r with r < n := n(x) and define 


Op t= d(x, T!*"x), for all kk € No, 
K := max{dg, d(x, T"x)}. 


On account of (ccf3), there exist a constant C with C > « such that 
t— y(t) > sk, fort > C. 
Notice that C > do. Now, we suppose that there exists a positive integer p such that 


5p = d(x, T!"*’x) > C. 
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Apparently, we may presume that 6; = d(x, T'"*"x) < C fori < p. Now, by the 
s-weighted triangle inequality, we have 


d(T"x, TP-Y"tr x) << s(d(x, T"x) + d(x, TP—Y"*"x)) = s(d(x, T"x) + 5p-1) < 2255p, 
aT Pht x, TP-UMH x) < 8(5p + 5p-1) < 2805p 


Since a is nondecreasing with respect to each component, (24) yields that 


oa 
s 
| 


= d(x, T?"*"x) < s[d(x, T’x) + d(T"x, T?"*"x)] 
= s[d(x, T’x) + d(T"x, T(T-)"*"x))] 
<s[kK+ 1 a(dp, Ops Op, 2850p, 2505 p)] 

=sk+ (6p), 


that is, dp, — y(dp) < sx. Itis acontradiction the choice of 6, > C andhence dé, < C 
for p € No. Consequently, the orbit {7'x} is bounded. 

In the second step, we construct a recursive sequence {x;} by starting an arbitrary 
initial point x := xp: 


Xa. = T"' x, where ny := n(xx), k € No. (25) 


It is clear that the sequence {x;,} is a subsequence of the orbit {T'x}. 
In the next step we assert that {x;} is a Cauchy sequence. For this purpose, consider 
the positive integers g, k. On account of (25), we observe that 


tpg a= Teta Mega Fo FMAM 
Let mo := Mgyg—1 + Mkpg—2 + +++ + Mey +x that yields 
Oh. tee HS Og ee) = da Pa) 


We, also, set T, := d(xx, T4x,—1) and let mj be an integer from the set {mo, nx_1, Mo 
+ n—1,} such that 7+ has the greatest value. Again by the s-weighted triangle 
inequality, we find 


d(T™xx_4, Tl x¢-1) < S(Tay_, + Tmo) < 25Tm* 
1b 
d(T" : Bs a T4xx_1) = S(Tny_,+mo + Tmo) < 25Tm* 


Keeping in mind that a is monotone in each component, (24) implies that 


d(xp, T™ xy) = d(T! 0x, TMT Xp_1) 
1 1 
< 5 a(Tn*, Tmt» Tm* 25Tm* 5 25Tm*) = Vz) < mi). 


So, we have 
d (xp, T x4) < y(d(xn-1, T"'x4-1)- 
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Recursively, we derive positive integers m’, Jj € Nso that 
d(xp-1, Tx 5) S YA j-, TI, xe j-1)). 
Regarding that 7 is non-decreasing, we find 
d (Xk, Xe+q) SK (do, T"'xx)) < H(D), 
where D is the diameter of the orbit {7x}. Taking the property (bcf 2) of 7, we have 
lim 7“(D) = 0, 
k-0o 
that is, {xz} 1s a Cauchy sequence. Since X is complete, there exists x* € X such that 
limy_so0 Xp = x. 
In the following step, we indicate that 
T"x* = x*, wheren := n(x"). 
Suppose, on the contrary, that €* = d(T"x*, x*) > 0. On the other hand, from the 
above discussion, we have 
lim d(T" xx, X~) = 0. 
k- oo 


Based on (cf;;;) of Proposition |, there exists kg so that 


1 1 
d(x*, XK) Zale — y(é*)] and d(T" xx, xx) S qe =A RS Koi 
Accordingly, we find that 
e* = d(T"x*, x*) < sd(T"x*, T' xp) + s7[d(T" xp, Xe) + d(xp, x*)] 
< a(d(x*, T"x*), d(x*, T" xx), d(x*, x4), d(T" x*, xx), A(T" xx, X%)) (26) 
$510 = 97). 
On the other hand, we have 


d(x*, T"x,) = s[d(x*, x4) + d(xx, T"xx)] = Gl2* - fey <t* 
d(T"x*, xz) = s[d(T"x*, x*) + d(x*, x)] < 5 (€* + Drle* — y(€*)]) < 2se*. 


Since a is nondecreasing in each components, the inequality (26) turns into 
£Y < afl", £", £", 26", 2€ )+ sé — (6) = a +O) < &, 


a contradiction. Attendantly, we conclude that 
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T"x* = x* 
In the next step, we shall show that x* is the unique fixed point of T”. Suppose, 


on the contrary, that there is another fixed point y* of T”, that is, T”y* = y* 4 x* 
where n = n(x*). On account of (cfj;;) of Proposition 1, from (24), we deduce that 


d(x*, y*) =d(T"x", T"y") 
= Za(0, d(x*, y*), d(x*, y*), d(x*, y*), 0) 
< a(0, d(x*, y*), d(x*, y*), d(x*, y*), 0) 


< yd", y*)) = y¥@Q", y*)) < d(x*, y*), 


a contradiction. Hence, x* is the unique fixed point of T”. 

In what follows we indicate x* is also the unique fixed point of T. Indeed, Tx* = 
T" Tx* yields that T x* is also fixed point. On account of the uniqueness, we conclude 
that Tx* = x*. 


Remark. As an immediate consequence of our main result is Matkowski Theorem 
[38] by letting s = 1. 


Theorem 21 Let (X, d,s) be acomplete b-metric space withs > 1 and let T satisfy 
the following condition: For each x € X there is a positive integer n = n(x) such 
that for all y € X 


1 
d(T" (x), T"(y)) < =H, y), on 
S 


where 
AR(x, y) = ald(x, T”™ (x)) + d(T™ (y), y)] + bld(x, T”™ (v)) + d(T" (x), y)] + cd, y), 


where a, b, c are non-negative reals with s(3a + 3b +c) < 1 Then T has a unique 
fixed point x* and T" (x) > x* for each x € X. 
Sketch of the proof. Take a(t), to, 3, t4, ts) = at, + bth + ctz + bty + ats. 


Theorem 22 Let (X,d,s) be a complete b-metric space with s > 1 and y be an 
auxiliary functions that satisfies (ccf 1), (ccf 2) and (ccf 3). Suppose that T : X > 
X satisfies the condition: 


r 
d(T" (x), T"™ (y)) < (d(x, y), ee) 


then T has a unique fixed point x* and T"(x) — x* for each x € X. 


Remark. As an immediate consequence of our Theorem 22 is Bota Theorem [38] 
by letting y(t) = at where a € (0, 1). 
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3 Conclusion 


In this paper, we give a brief survey on the fixed point results for mappings with 
a contractive iterate at a point. Moreover, we give an example to indicate how we 
can extend these results by considering the famous results of Matkowski [38]. These 
results can be extended by considering the existing results in more general frame- 
works like quasi-metric space, b-metric space, partial metric space, dislocated metric 
space, Branciari distance spaces and their combination. Indeed, in the first part of the 
paper, some more examples on generalization of the fixed point results for mappings 
with a contractive iterate at a point were given. Another future work, inspired from 
the results in this paper, can be application on differential and integral equations. 
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An Algorithmic Perspective on the ®) 
Thermoelasticity of the Micromorphic ae 
Materials Using Fractional Order Strain 


Lavinia Codarcea-Munteanu and Marin Marin 


Abstract The chapter is dealing with the approach to the micromorphic materi- 
als linear thermoelasticity, its purpose being to present an algorithmic perspective 
in obtaining, on the one hand, the constitutive equations of this linear theory and 
the non-Fourier heat equation form, and on the other hand, in obtaining a recipro- 
cal relation, by using the Caputo fractional derivative. This algorithmic perspective 
allows the creation of a technical scheme that can be followed in determining these 
basic relations of the linear thermoelasticity of the micromorphic materials, provid- 
ing a model that can be used to determine these equations for other materials, thus 
the theory can be extended to various materials. The algorithmic transposition of a 
mathematical model always leads to a relaxation of the mathematical solution and 
to its systematization, the mathematical reasoning being presented in the form of 
concrete steps to be followed in order to obtain the solution of a problem. 


Keywords Micromorphic materials - Algorithm - Thermoelasticity - Fractional 
derivative 


1 Introduction 


The creation of an algorithmic logical thinking leads to the solving of complex 
problems that require the description of several calculation steps. In the analytical 
process of solving a problem in a given field, starting from a hypothesis where the 
initial dates, conditions and states are highlighted and we get to the conclusion that 
integrates the goals, the objectives and the requirements of that problem realizing 
this by the reasoning that will lead to the solution of the problem, a process that 
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involves the selective use, according to the theorems, sentences and principles in 
the problem definition area. The algorithmic method can be successfully applied in 
solving a great variety of problems in describing natural phenomena as well as some 
specific human activities, in many cases this method leading to the abridgement and 
systematization of the mathematical methods, allowing easy application of the same 
solving schemes but in different cases. 

Obviously, the method of obtaining the solution by describing an algorithm can 
not be applied to all problems, but this algorithmic method is a fundamental require- 
ment in therms of the computer-resolved issues. An algorithm, see Markov [30], 
established various techniques and methods of solving that have been discovered or 
finalized at a particular moment in the development of a domain. Throughout this 
chapter, we will present an algorithmic approach to some themes of the micromor- 
phic materials linear thermoelasticity under the Caputo fractional derivative action, 
referring to determining the constitutive equations, of the non-Fourier heat equation 
form, as well as of a reciprocal relation valid in this theory. 

The microstructural theories, the basis of which were accomplished by Cosserat 
brothers, see Cosserat and Cosserat [10], and subsequently developed by other 
researchers, see Green and Rivlin [16]; Mindlin [31], have created the favourable 
context to the development of the micromorphic continuum theory, primarily due to 
Eringen, who approached the theory of micromorphic elastic materials based on the 
theory of microstretch elastic bodies, see Eringen [11, 12], developing a micromor- 
phic theory for polycrystalline mixtures, granular composites and fluid suspensions 
in Twiss and Eringen [38, 39] and extending the microcontinuum field theories from 
the classical field theories to the microscopic spaces and short time scales, see Erin- 
gen [13, 14]. 

In micromorphic continua, each particle is considered to be deformable, hav- 
ing twelve independent degrees of freedom, three displacement components, three 
microrotations components and six microdeformations components. 

There are conditions in which the micromorphic theory identifies itself with other 
theories, see Cao et al. [2]: if it is supposed that the deformation is infinitesimal and 
the motion is slow, the micromorphic theory is identified with the microstructure 
theory, if the particle microstructure is considered to be rigid, this theory can be 
identified with the micropolar theory and if the particles are regarded as mass points, 
the micromorphic theory will overlap with the theory of classical continuum. These 
related theories have been the subject of much research over time, some examples 
of the micromorphic media theory being Iesgan [21, 22]; Svanadze [37]; lesan and 
Nappa [23]; Lee and Chen [25], Gales [15]; Iegan [24], as well as other examples 
of more recent studies on this theory, such as: Romano et al. [35]; Liu et al. [26], 
and continuing with several other approaches to theories related to the theory of 
micromorphic media, see Marin [27]; Marin et al. [28, 29]; Othman and Marin [32]; 
Hassan et al. [17] or to the generalized micromorphic theory, see He [18]; Sheoran 
et al. [36]; Zeng et al. [42]. 

The fractional calculus, see Baleanu et al. [1], has long been viewed only theoret- 
ically, but for several decades it has gained an important place in practical applica- 
tions, being used both in classical and newer fields, with its various applications being 
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present for instance in chemistry, engineering science, thermodynamics or theory of 
control. In Povstenko [34], the author presents a variety of different types of differen- 
tial as well as integral fractional order operators: the Riemann-Liouville and Caputo 
fractional derivatives and the Riesz fractional operators. Due to the importance of the 
initial conditions, their influence leads to the conclusion that the Caputo fractional 
derivative is “a regularization in the time origin for the Riemann-Liouville fractional 
derivative”. By treating differently the fractional order differential equations for 
physical implementations, in which the initial conditions are currently represented 
by expressing a function and its derivatives of integer order, though the order of the 
specific equation is fractional, the great benefit of this particular type of fractional 
derivative is obvious. In Podlubny [33] is presented the first orderly analysis based 
on the fractional derivatives tables which allow the assessment of various types of 
fractional derivatives. 


2 Theory 


It is considered a bounded region D of the three-dimensional Euclidian space R?, 
which corresponds to a micromorphic thermoelastic body in the reference configura- 
tion, whose boundary OD is a piecewise smooth surface and whose closure is denoted 
by D. Throughout this chapter we will use the well-known classical summation and 
differentiation conventions: the Einstein summation over repeated subscripts will be 
used, the partial differentiation corresponding to the Cartesian coordinate x; will be 
denoted by a comma followed by i, and the partial differentiations of the function f 
in relation to time will be marked by superposed dots. If the confusion can not appear, 
the function time argument, respectively the function spatial argument, will be omit- 
ted. To the above, we add the fact that the motion of the body is reported to a fixed 
system of rectangular Cartesian axes and the tensor Cartesian notation will be used. 
Also, the values taken by the Greek and Latin indices are 1, 2 respectively 1, 2, 3. 
Considering an anisotropic body, the equations governing the linear thermoelasticity 
of the micromorphic materials, see Iesan and Nappa [23], consist of: 

— the equations of motion: 


tii,j + Phi = pui, 


; (1) 
Meij.k + ty — Sy + plig = Tix'Pirs 


— the equation of energy: 
PTH = Gi + pS, (2) 
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— the geometric equations: 


Eij = Uji — Pji> 


1 
ij = 5 ii + ji), (3) 
Viik = Pij,k> 


where the above equations are verified at all points (x,t) € D x (0, oo), and the 
following notations were used: 


tij, Sji, Mxij are the components of the stress tensor, the components of the 
microstress tensor, respectively the stress moment tensor components over D, 
uj, (pi; are the components of the displacement vector, respectively the components 
of the microdeformation tensor, 

p represents the reference mass density, 

I;; are the components of the microinertia tensor, having the symmetry property 
Tj = 1ji, 

fj, are the components of the body force vector, 

;; are the components of the body moment tensor, 

T represents the temperature measured from the constant absolute temperature Tp 
of the solid in its reference configuration, specifying the additional notation, for 
the variation of the body temperature 


A(x, t) = T(x, t) — To, 


7 represents the specific entropy per mass unit, 

qi are the components of the heat flux vector, 

S represents the heat supply per mass unit, 

Eijs ij» Vijk are the components of the strain tensors. 


We associate the above equations with the following initial conditions: 


u; (x, 0) = a;(x), u;(x, 0) = b;(x), x E€ D 
pi, 0) = aij (x), Pi, 0) = Bij (x), xéeD (4) 
A(x, 0) = O(%), n(x, 0) = AC), x € D 


and the following boundary conditions: 


uj (x,t) =u;, on OD x [0, 00) 
gij(x, 1) =G;;, on ODz x [0, 00) 
O(x, ft) =p. on OD3 x Poe) 5) 
ti(x,1r) :=tji(x,r)nj(x) = t;, on ODF x [0, ov) 
mij (x, 1) =m (x, r)ne(x) = mij, on ODE x [0, 00) 


q(x,r) :=qi(x,r)ni(x) =, on ODE x [0, 00) 
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where we denoted by n; the components of the outward unit normal vector to the 
surface OD and by #;, m;; the surface traction components, respectively the surface 
moments components. 

At the same time, 0D,, 0D2, OD3 and their corresponding complements 0D©, 
ODS, ODS are subsets of the surface OD such that: 


OD, UODE =D. UODS = OD; UODS = OD, 
OD, N ADF =OD, N ADS = OD3N ODF = 2@, 


and a;, bj, aij, Bij,O, A, es oe 0, :. mij, q are prescribed continuous functions on 
their domain. 

The first thermodynamics principle of the micromorphic materials, following Yu 
et al. [41], can be expressed in the form: 


[oui + Tie Qij ik + pe)dV = 
D 


(6) 
= [oct teyey + S)dV + [oss + mi pj +qdA, 
D aD 


where we denoted by e the internal energy per unit mass. 

In this chapter we will use the Cattaneo heat conduction equation, and regarding 
this equation, it should be underlined, see Cattaneo [5], the fact that Cattaneo raised 
the heat equation paradox in the mid-twentieth century, that of the parabolic shape 
for the heat equation as a result of the thermal signals propagation at infinite speed, 
in contradiction to the practical experiments results, requiring changes to the Fourier 
model, see Liu et al. [26], in the form below: 


ig (4 (ar _ 
= 2 Ox; Ot Ox; , 


where K;; are the components of the thermal conductivity tensor and 79 is a positive 
scalar named as the thermal relaxation time of the heat conducting medium, equation 


that was modified afterwards on the supposition that the operator 7 a can be 


1 q care : (8) 
er - ry 


approximated by: 


Using the previous approximation, the transient Fourier model equation (7) turns 
into the equation that bears his name, the Cattaneo heat equation, see Cattaneo [5]; 
Hetnarski [19]: 

Gt+mgi =—KijTj, i=1,2,3. (9) 
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In order to obtain some basic relations of the micromorphic materials linear theory of 
thermoelasticity, we will use the Caputo fractional derivative of order 7, see Caputo 
[3]; Caputo and Fabrizio [4], given by: 


pi fa) = — "el Digs oe 24 (y=gamma) (10) 
Pd-yJo @-s)? 7 Y 


in the previous relation I'(-) being the Gamma function and f assuming to be an 
absolutely continuous function in relation to time f. 

Being a combination between the internal energy e, and the entropy function 7, 
the Helmholtz free energy W is defined as: 


W =e-Tn. (11) 


3 Result 


In the sequel, in order to reach, in an algorithmic manner, the purpose of getting, on 
the one hand, the constitutive equations of the micromorphic thermoelasticity and 
the non-Fourier heat equation form, and on the other hand, the reciprocal relation, by 
using the Caputo fractional derivative, we follow the procedure presented in Youssef 
[40], see also Chirila [6]; Codarcea-Munteanu and Marin [7], and we consider the 
thermoelastic material state as below: 


W=WE;,, ij, Vines T, Ts 
e =e(Ei;, Mij, Vik, T. Ti), 


i‘ (12) 
n =n(Eij, bij, Vik T, Ti), 
G =9 (Eis Ps gee Ts LO 
where: 
Gj = (+7 Dei, (13) 


T signifying the mechanical relaxation time constant parameter. 
At the same time, we consider the free energy, according to the linear theory, see 
Yu et al. [41], in the next form: 


1 AAs 1 1 a 
p¥ (Eij, bij, Vijks 9) = 3 AiseteijExt + 3 BiikiMij el + 3 Ciiklmn VijkVWmn + Ejjeiéij bait 
(14) 


1 
ns = 2 
+ FijtmnEij Yn + Gijtmn bij Yn — Gij&ij9 — Bij Hij9 — cijevijn® — Fa" 


where the coefficients Ajjx, Bijxi, Cijkimns Eijkt, Fijimn» Gijimns ij» Pij, Cijk and a 
are prescribed functions, representing the material specifications and fulfilling the 
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symmetry properties below: 


Aijn = Anij, Bij = Brij = Bjiki, Cijkimn = Cimnijk, (15) 
Eijg = Exij, Gijimn = Gjilmn, Gj = ji, Cijk = Cjik- 


In the following we will approach both obtaining the constitutive equations corre- 
sponding to the thermoelasticity of the micromorphic media and the non-Fourier 
equation of the heat, presenting the steps to be accomplished to achieve the proposed 
aim. 


An Algorithmic Perspective on the Constitutive Equations 


Step 1 
We multiply the equations of motion (1) by w;, respectively :;;, obtaining the fol- 
lowing relations: 


tii, jUi + pfili = pujit;, 


: ; ‘ ; sa (16) 
Mrij,kPij + tyiPii — Sig Pig + PLZ Pi; = Lik Pij Pik- 


Step 2 
We integrate over D the previous obtained relations and we introduce them into the 
first thermodynamics principle, resulting the below relation: 


Kom + Mkij ki + tii Pij _ sipQij)dV + / pedV = 

D D (17) 

= fosav+ fin, + Mgij PijMk + qinj)dA. 
D OD 


Step 3 
We apply the theorem of divergence in the above relation, obtaining: 


[rsitis, ai ieee ate epee ae) 
D 
Step 4 


Taking into account the integrated functions continuity and the fact that the domain 
D of integration is arbitrary, we obtain from the previous relation its point wise form: 


pe = tji (iy — Pij) + mej Pij.k + Sig Pij + 91,1 + PS. (19) 
Step 5 


We use, on the on hand, wa=e- Tn — T7, obtained from the relations (11) and, 
on the other hand, the fractional order strain, getting: 
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py = tie ji + mij Vijk + Sij Mig + 91,4 + PS — pT n — pT%. (20) 


Step 6 
We consider the relation (12),, YW = W (Ej, ij, Vijk» I, T;) and the chain rule, hav- 
ing: 
OW OW , Ow i OW ee ‘ (21) 
i Gige  oe” Pan, 


Step 7 


(22) 


Step 8 
From the form of the free energy (14), taking into account the previous relations, we 
obtain the following constitutive equations of micromorphic materials: 


ti = AijeEx + Lijit + Fijimn inn — Gj = 
= Ajj + 77D? ei + Eijeibar + FijtmnYimn — 4ij9, 
ij = Bijeipear + ExijEu + GijtmnYimn — bij9 = 
= Bijetua + Ej + 77D) eu + GijtmnYimn — 5:9, 
Mig = CijkimnYinn + FimijkEim + Gimijk tim — CijkO = 
= Cijkimn’Yimn + Fimijk( + 77 Dp )éim + Gimijkbim — Cj, 
ON = GijEig + dij pag + CijeVijk + 20 = 
= a (1+ F"D) ey + bij + Cyr vye + a0. 


(23) 


Following the eight steps described above, we infer the theorem that refers to the 
determination of the constitutive equations corresponding to the linear theory of the 
thermoelasticity of the micromorphic media, namely: 


Theorem 3.1 The micromorphic materials generalized thermoelasticity using frac- 
tional order strain has the constitutive equations described by the relations (23). 


An Algorithmic Perspective on the Non-Fourier Heat Equation 
Step 1 


The equation of energy (22)¢ can be rewritten in an unfolded form based on the 
relation (22), as follows: 


( ow ee Ow Ow OW . 
p 


aTO mi aT Ong MH OTF ?) qii + pS. (24) 
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Step 2 
The previous relation is written in the next equivalent form: 


2 (5% Ve 2 (5% Va + 2 (5%): 0 (_ ow, : 
Ei t lij + ij ii + ps. 
Or Pee) or! ny)” OF VP Oya) Por oF Te 


(25) 
Step 3 
Using the relations (22), (22), (22)3 and (22)4, we get from the above relation: 
Ot;; 2 Os: Om j jx On ,. 
ii = S T te, iy es Vij T ). 26 
aaa (Sess + Fein + or 8 Yar 26) 


Step 4 
Considering the constitutive equations corresponding to the micromorphic materials 
thermoelasticity (23), the previous relation leads us to: 


Gis = —pS + ay TE\j + difT frag + CijkTHijn + ATT. (27) 


Step 5 
In the case of linearity, considering T ~ 7p, the above relation becomes: 


Gis = —pS + aij Toeij + bij Tottiy + cijeToVijx + AToT (28) 
Step 6 


The partial differentiation of the Cattaneo heat equation (9), in relation to the Carte- 
sian coordonate x;, leads to: 


Gii + TO4i,i = (—KijTj) i, 1, j = 1,2, 3. (29) 
Step 7 
Considering both relations (28) and (29), we obtain: 
(—KjjT.j), = —pS + Tolajj(l + 71D? ei; + dij itij + cijeVije + AT] 
— tolpS — aijTo(A + 77D? )Ei; — bij Toftiy — cijxToVijxk — aToT I. 
(30) 
Step 8 
We write in an equivalent form the previous relation (30) as follows: 
() () a ae 
(—Eyil pa =p\ le Oe a ee | eno et Dey 
Ot Ot Ot (31) 


— bij Topij — CijxTovijk — ater |. 
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These were the concrete steps leading to the non-Fourier heat equation form in 
the context of the micromorphic thermoelasticity that uses the Caputo fractional 
derivative, thus we can formulate the next theorem: 


Theorem 3.2. The non-Fourier heat equation of the micromorphic thermoelasticity 
using fractional order strain is given by the Eq. (31). 


An Algorithmic Perspective on the Reciprocity 


In order to determinate a reciprocal relation in the context of our theory, before 
describing the steps leading to this relation, we will point out the next helpful defi- 
nitions, notations and theorem: 


— following the method of Iesan, presented in Iegan [20], see also Codarcea- 
Munteanu et al. [8]; Codarcea-Munteanu and Marin [9], to simplify the writing, 
we will reformulate the motion equations in the form below: 


ti, + Fi = pti, (32) 
Mijk + tii — Si + Lay = Tie Pir, 
and the energy equation as: 
pTon = gi + H, (33) 


where pf; = F;, pli; = Li; and pS = H; 

— the reciprocity subject will be developed considering the initial conditions incor- 
porated in the specific equations characterizing the thermoelasticity of the micro- 
morphic materials; 

— we recall the definition of the convolution product: 


t 


(g*r)(x,t) = ic t — s)r(x,s)ds, (x,t) € D x [0, ov), (34) 
0 


where g and r are continuous functions related to time; 
— in the following, in order to obtain a reciprocity relation, we also use the result 
given by the next theorem: 


Theorem 3.3 The functions u;, pi; and n verify the Eqgs.(32), (33) and the initial 
conditions (4) if and only if the following relations are satisfied: 


hx tjij + Fi = puj 


h* (mgije + ti — 8ji) + Zij = Tikit, (35) 


1 
= 21k di, ; 
py Th *gii tO 
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where: 
h(t)=(U*«)@), 


I(t) = 1, € [0, oo) (36) 


F; =h*F; + p(th; +a;), 
Lj = he Li + Lj (CG + cy), 


(37) 


— we consider two external data systems Y “ acting successively on the thermoelastic 
micromorphic material, defined by: 


€ x) 29 HO 2O 2O FO 
yO — FO.LO Ho, Pe a mea, fi a, DO, oie 09 A®}, €=1,2: 


— we will note a solution of the mixt problem, corresponding to Y), by: a 
aur ee), 6212) (39) 
— we will have: 
i = 12, mp = min 
€=1,2; (40) 


Pin, oO = al eH + pa®. 
To 


For the purpose of obtaining a reciprocity relation, we present the algorithmic method 
whose steps to be followed are described below, mentioning the fact that this algo- 
rithm has in turn a “sub-algorithm’’, the second step being described by a series of 
“sub-steps”’. 


Step 1 : 
We introduce the functions F¢, given by: 


Fey = 19 EO + se WO + mo «YQ — (pn) « O™. (41) 


Step 2 
We prove the symmetry property of the above newly introduced functions F;,, 
namely: 7 2 

Fey = Frye, f=1;.2, y=, 2. (42) 


Step 2.1 
Using the first constitutive equation (23), and the first property of symmetry 
(15);, we compute the convolution product Aj; MEG * oe , obtaining the following 


relation: 
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1 Kk ey) _ [Eijurpiy + Fijtmny®, —_ a;j;0 | * ea — (43) 
(Vv) | x8) 1) (v) x8) 
= ti. * oe = [ Eijnityy + FijtmnVimn — aij] ie a . 
Step 2.2 
Calculating the convolution product B; jib * te by using the second consti- 
tutive equation (23). and the second property of symmetry (15)2, we have: 


se ~ Lip _ [Busey + Gijtmn Yon 7 bi0] * Hi; = 


= Cr * ji 7 [ Euije? + Gijtmn Yor — bij0 | * ie 


(44) 


Step 2.3 


Considering the third constitutive equation (23); and the third symmetry 


property (15)3, we develop the convolution product C; ene * a, getting: 


ine * We > [ Fimi jhe} ae Gimijxt® = cijr0® | * nr = 
ym) xv) i) 1. .© (45) 
= 53; # ‘Vijk 7: [ FimijkE ip F GimijkHim = CijeO | * Vijk- 


Step 2.4 
From the fourth constitutive equation (23)4, by computing the convolution prod- 
uct a6 «x 6, we deduce the next relation: 
( 
(on) #0 — [aigEi? + dijuiy + cine] #0 = 


(46) 
— (en) * 9) = [aije/’ + bi hy, + cine VA | * e 


Step 2.5 
By making the sum of the relations (43), (44), (45) and then subtracting the 
relation (46), the wanted property of symmetry will be obtained. 


Step 3 
Making a parallel between the symmetry relation above and that of the classical theory 
of the micromorphic bodies thermoelasticity, expressed by the following lemma: 


Lemma 3.1 The functions Fe,, given by: 


c ( Vy 
Fey = 1 « el) +50 & wy + me «Vie — (on®) «0, (47) 


with the symmetry conditions (15) fulfilled, satisfy the symmetry property: 
Fey, = Frye, 6= 1,2, y= 1,2, (48) 


we observe that the classical theory of the micromorphic materials thermoelasticity 
is not affected by the usage of the fractional derivative, highlighting the fact that €;; 
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fulfill as well the classical symmetry relation (48), so we can continue the reasoning 
as in the classical theory, through the next step: 


Step 4 
We use the geometric equations (3) and the relation (35)3 into the relation (47), 
obtaining a new form for the functions F¢,, as follows: 


1 
(§) ” (§) (v) (§) (Vv) (§) 
Fey = ty * (ups — Pj )+sy # Di; +m «off — (Teas +9°) «0 
1 
(§) (€) ) (§) (©) ( 
_ =["» wy? + mi * Oy — ( tq) 0] oe ui — 
i 


1 
- (m a at Wo — s} * gi + (5! * a?) * oe —-Q® «@™, 
(49) 


Step 5 

Utilising in the previous relation (49), the relations (35), (37) and (40), the convolution 
product properties, the theorem of divergence and, at the same time, making use of 
the relations g; = K,;6,;, from the linear theory, we obtain: 


1 
fu * Fe,)dV = fr * (: 1 xu + me «pO — —l*q® * 0”) dA+ 
D oD _ 
fe / |#° 2 u? +29 rs ye - hx O® £9 — pu xu? =i4 o® a5 y+ 
D 
1 
+ hale (Kij0 6) |av 


(50) 
where: 


F® = hx FO + p (10 +aj°), 


GD = is Day (139 + a) (51) 


1 
Q® = 7 *H® + ph®, 


We can conclude that the reciprocity under the conditions of Caputo fractional 
derivative action on the thermoelasticity of the micromorphic materials is exactly 
the same as the reciprocity of the classical theory; therefore, at the last step we can 
deduce, based on the previous steps, the theorem operating both in classical theory 
and in the theory of the micromorphic bodies thermoelasticity in which we have 
used the fractional order strain, that gives us the classical reciprocal relation of the 
Betti-type, namely: 
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Step 6 


Theorem 3.4 Jf the symmetry conditions (15) are fulfilled, the coefficients of the 
microinertia I;;, respectively those of the thermal conductivity Kj; tensors are sym- 
metrical, and if s“ is the solution related to the external system mixt problem 
Y®, €=1, 2, then the following reciprocity relation is satisfied: 


(1) (2) a) (2) 1 2 
[4 tu; +Zi, * Yi; — hx OY * 6) dV+ 


L 


D 
1 
+ fr * («” k a + my * ee _ Th * g? * 0@)aa = 
aD a 
_ i; (#? wu? +22 xg) —he Q? x 6”) dV+ 
D 
1 
+ / hx (0° eu +m® «pO — 7! xq * 9) Ak 
aD 


4 Discussions and Conclusions 


The algorithmic approach to the reasoning in order to elaborate the constitutive equa- 
tions and the non-Fourier heat equation corresponding to the thermoelasticity of the 
micromorphic materials using the fractional order strain, as well as the demonstra- 
tion of the validity of the Betti-type relation from the classical theory to the linear 
theory of the micromorphic materials thermoelasicity, under the influence of the 
Caputo fractional derivative, has the effect of developing concrete steps to be fol- 
lowed in order to achieve the proposed goal. These steps represent the algorithmic 
perspective on the theory developed throughout this chapter, the algorithmic vision 
creating a pattern that can be applied to other materials studied in order to achieve the 
same objectives, constituting a basis for the diversification of the present study. The 
steps and “sub-steps” presented establish in fact a follow-up scheme that facilitates 
the approach to this subject, the steps that need to be followed being specifically 
described and leading to the conclusion that the reciprocity in the linear theory of 
micromorphic materials that uses the fractional derivative is exactly the same as the 
reciprocity in the classical thermoelasticity of these materials, and there are clear 
schemes to obtain the constitutive equations and the non-Fourier heat equation form. 
This algorithmic method can be applied to other problems that can be studied both in 
the theory developed in this chapter and in other theories related to other materials, 
the diversity of the application of the algorithmic method allowing the extension of 
this perspective to various problems. 
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Adina Chirila and Marin Marin 


Abstract We study from a numerical point of view the solution of the system of 
partial differential equations arising in the theory of isotropic dipolar thermoelasticity 
with double porosity. We write the variational formulation and introduce fully discrete 
approximations by using the finite element method to approximate the spatial variable 
and the backward Euler scheme to discretize the first-order time derivatives. By using 
these algorithms, we perform numerical simulations in order to show the behaviour 
of the solution. To this end, we use the finite element software FreeFem++. 


Keywords Dipolar thermoelasticity - Double porosity - Mechanics of generalized 
continua * Finite element method 


1 Introduction 


The Cosserat brothers laid the foundations of mechanics of generalized continua 
in 1909. Ever since, many studies have been published in this field, see 
[1, 4, 8, 14, 23]. 

Mechanics of generalized continua is a field of applied mathematics which studies 
materials with microstructure. An important class of such materials are the ones with 
vacuous pores. In fact, porous elasticity is an extension of the classical theory of 
elasticity in which we consider an extra degree of freedom for each material particle, 
namely the change in the volume fraction field from the reference configuration. It 
has applications in different areas of engineering and biology, such as petroleum 
industry, materials science, ceramics, pressed powders and bones [9]. 

Another important theory in the study of materials with microstructure is the 
theory of dipolar elasticity. It was first studied by Mindlin [24] and Green and Rivlin 
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[11] and more recently in [2, 19, 20, 27]. It assumes that each particle has twelve 
degrees of freedom, namely three translations and nine micro-deformations. 

The mathematical model of dipolar thermoelasticity with double porosity was 
introduced in [22]. The authors highlight the fact that the displacement field influ- 
ences the double porosity structure of the body, as discussed in [15]. 

There are other theories of elasticity in which several effects are combined. For 
example, porous and strain gradients effects are studied in [9, 13], porous and dipolar 
effects are analyzed in [21]. 

In the mathematical model of elasticity with double porosity, we consider that 
at the macro-level we have the pores of the body, while at the micro-level we have 
the fissures or cracks of the skeleton [15]. We also encounter the concept of triple 
porosity, where we have voids at the macro-, meso- and micro-level, see [26]. 

Different problems arising in mechanics of generalized continua were studied 
from a numerical point of view, see [5, 10, 12, 18, 27]. There are various discretiza- 
tion methods, such as finite difference, finite element, finite volume, discontinuous 
Galerkin. The finite difference method is an external approximation, while the finite 
element method is an internal one. Most of the constants are determined by exper- 
iments, see [6]. For the numerical treatment of the derivative with respect to time, 
some authors use the Newmark scheme [17], or the backward Euler scheme [9, 10]. 

In this chapter, we study the theory of thermoelasticity when double porosity and 
dipolar effects are combined. We write the variational formulations corresponding to 
the system of partial differential equations in the three-dimensional case and under 
the conditions of plane strain. Then, we discretize the spatial variable by the finite 
element method and the derivative with respect to time by the implicit Euler scheme. 
We consider a two-dimensional rectangular domain, then plot the displacement for 
different values of the internal length scale and the temperature. We also consider a 
circular hole in the center of the domain and plot the displacement. 

The chapter is structured in five sections. In Sect.2 we give some preliminaries 
regarding the basic ideas of the finite element method and the mathematical model 
which will be discretized by this method. In Sect. 3 we derive the variational formu- 
lations. In Sect.4 we present some numerical simulations for the two-dimensional 
problem and in Sect.5 we draw the conclusions of our study. 


2 Preliminaries 


We study an isotropic dipolar thermoelastic material with double porosity that lies 
in a bounded domain Q of the three-dimensional Euclidean space with the boundary 
OQ. 

If we write a dot over a variable, then we mean that we differentiate the corre- 
sponding variable with respect to time. If a variable has a subscript preceded by a 
comma, then it is partially differentiated with respect to the corresponding spatial 
coordinate. As a notational convention, we use the Einstein summation. 
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2.1 The Finite Element Method 


The finite element method is used for computing numerical solutions of a certain 
boundary value problem and it is based on its weak form instead of its local form. 
Therefore, we derive a variational formulation of the problem. In this form, existence 
and uniqueness can be shown by the Lax-Milgram lemma [3]. 

Then, we require a finite element mesh. On this mesh we find the approximation 
to the weak solution by using shape functions and nodal values. The test functions are 
projected onto finite-dimensional spaces. In this way, the continual weak formulation 
is approximated in a finite-dimensional space. The resulting system of linear algebraic 
equations is solved by various methods. Recently, an efficient quantum algorithm was 
proposed for this purpose [25]. The motivation is the following: the large systems 
of linear equations that occur in the FEM are produced algorithmically, rather than 
being given directly as input and the FEM naturally leads to sparse systems of linear 
equations, which is usually a requirement for a quantum speedup [25]. 

Nonetheless, we will solve the classical case with FreeFem++, which is an impor- 
tant finite element software. In FreeFem++, there are several methods of solving the 
discretized form of a system of partial differential equations and the matrix of the 
underlying linear system is stored differently, depending on the choice of the solver’s 
type and specific algorithms. For example, the matrix is sky-line non symmetric for 
LU, it is sky-line symmetric for Crout, it is sky-line symmetric positive definite for 
Cholesky, it is sparse symmetric positive for CG and it is just sparse for GMRES, 
sparsesolver or UMFPACK [7]. 


2.2 The Mathematical Model to Be Discretized by the Finite 
Element Method 


We study the behaviour of a body which obeys the following equations [5] for the 
displacement vector, the microdeformation tensor, the pressure due to pores, the 
pressure due to cracks and the temperature 


(tj: + nji),j; + PoFi = poili, 

Hijk,i + jk + poMj, = hes djs, 

Gi; +E + poG = Ki19, (1) 
Ti $C + pol = Kp, 

poTon = Gii + poQ 


in Q x (0, 00). Below we give the constitutive equations in the isotropic form [5] 
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tog = Ei Spq + 2HE pq + 819 pqkinn + 82% pq + 82K%qp + A145 pq¥ + d29pqh — 25pq9, 
Npq = 81EiiS pq + 282Epq + DI Ki pg + b2K pq + D3Kaqp + 435 pqye + As pq) — BSpq9, 
Lpgr = 41 (XiipSgr + Xrnndpq) + 42(XiigS pr + Xmrm4 pq)+ 
+ a3Xiirdpq + 44X pkkOgr + 45(XgkkOpr + Xmpmogr) + 48XigiO prt 
+ a10Xpgr + 411 (Xrpq + Xqrp) + 413Xprq + 414Xqpr + 415Xrqp> 
€ = —dheji + 3Kjj + 1p + a3y — 19), 
C = (dei + dk + a2 + 039 — 729), 
oj = dgp jy + sv j, 
Tj = A595 + doi. 
pon = acy + BRIG +L + V2) +48. 


(2) 
In the expressions above, we used the following geometrical relations 
2€;;(U) = Ui,j + j,i, 
Kij(U, @) = Uji — ij, (3) 


Xijk(D) = Ojxi- 


The independent variables in the system of partial differential equations satisfy the 
following initial conditions 


uj(x,0) =uP(x), (x, 0) =u} (x), 

$ij(x,0) = OF), bij, 0) = 4),(x), 

y(x, 0) = (x), G(x, 0) = v(x), (4) 
Px, 0) = P(x), or, 0) = ¥'@), 

A(x,0) = 6% (x), xEQ 


and boundary conditions 


uj(x,t)=0, @j(*,t)=9, v(x,t) =0, 5) 
w(x,t)=0, O(x,t)=0, (x,t) € AQ x (0, 0), 


where u?, u}, Pe, He vy, y!, w, w!, 0° are prescribed functions. 

The quantities in the equations above have the physical interpretation described 
in Tables | and 2. 

Several results have been proven for the model described above in the anisotropic 
case in [22]. First of all, it is shown that the first initial boundary value problem, 
corresponding to null loads and null boundary conditions, has at most one solution 
[22, Theorem 1]. This holds true if the constitutive coefficients satisfy some symme- 
try conditions and the thermal conductivity tensor, the mass density, the coefficients 
of equilibrated inertia and the thermal capacity satisfy some positivity assumptions. 
Under these conditions and by considering that some quantity describing the energy 


Numerical Algorithms in Mechanics of Generalized Continua 181 


Table 1 Physical interpretation 


tij The components of the Cauchy stress tensor 

Nij The components of the relative stress tensor 

Hijk The components of the double stress tensor 

Po The constant mass density 

Tks The coefficients of microinertia 

F; The body force per unit mass 

M jx The dipolar body force per unit mass 

o The equilibrated stress vector associated with 
pores 

T The equilibrated stress vector associated with 
fissures 

& The intrinsic equilibrated body force associated 
with pores 

¢ The intrinsic equilibrated body force associated 
with fissures 

G The extrinsic equilibrated body force per unit 
mass associated with pores 

L The extrinsic equilibrated body force per unit 
mass associated with fissures 

Kl, k2 The coefficients of the equilibrated inertia 


Table 2 Physical interpretation 


Uj The components of the displacement vector 
field 


The components of the dipolar tensor field 


The pressure due to pores 


The pressure due to cracks 


The heat flux vector 


The entropy 


The heat supply per unit mass 


dij 

e 

w 

é The temperature 
q 

uf] 

Q 

S 


The heat supply per unit volume 


of the system at the inital time is negative, the solution is also shown to be asymptot- 
ically unbounded. Second of all, the mixed initial boundary value problem from [22] 
is transformed into an abstract evolution equation in a proper Hilbert space. Under 
the symmetry and positivity conditions from [22, Theorem 1], the corresponding 
operator is shown to be the generator of a C° semigroup of contractions. Then the 
existence of a unique solution follows easily. A globally bounded solution is obtained 
for homogeneous loading. 
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3. Variational Formulation 


In order to perform the numerical simulations, we need the variational formulation 
corresponding to our system of partial differential equations. This is obtained by 
multiplying the equations by suitably chosen test functions, by integrating over Q 
and by further using the divergence theorem. Note that we use a system approach, 
which is needed in FreeFem++ in order to impose the right boundary conditions. 
The spaces for the solution and the test functions will be specified by the end of this 
section. 

Below we write the variational formulation for u; corresponding to the system of 
Eqs. (1); 


/ Poli pZpdV Se i {A a R1)UiiZp,p + (+ 82)(Up,q + Ug, p)Zq,p+ 

Q Q 

+ (g1 a b1)Ui,iZp,p F (g2 + b2)Ugq,pZq,p + (g2 + b3)Up,qZq,p} dV= 

~ | poF pZpdV +f (tog + Npq)ZqnpdA +f {(g1 + bi) diiZp,pt (6) 
Q OQ Q 


+ (g2 + b2)bpqZq,p + (g2 + b3) PapZq,p — (di + d3) PZ p,p— 
—(dz + d)Wzp,p + (a+ B)OZp,p} dV. 


In the formulation above, z(x) is a sufficiently differentiable vector function and satis- 
fies homogeneous boundary conditions. Note that we have null boundary conditions 
for u;, which are motivated by the existence and uniqueness results. Hence there 
is no displacement at the boundary, it is fixed. If instead of homogeneous Dirich- 
let boundary conditions we choose inhomogeneous Neumann boundary conditions, 
then it means that an external surface force is applied. As in [9], we choose the space 
[Hj (Q)}> for the test function and define Q, = L? (0, T; [Hj ()]°). 

In the sequel, we write the variational formulation for ;;, with the test function 
y(x) sufficiently differentiable 


/ I bqrPqrdV ae / (didins oun a 41 Pan,r? pr, p oF 42 Pig,i Pap,p+ 
Q Q 


p 42Orm,m¥ pr, p a3 0ir iP pr,p + a4bkk,p Laq.p ae a5 Pkk,qLap.p+ 
oF 45P pm,mLaq,p oP Aghgi,i Pap, p + A10Par,p Lar. p + 11 Ppq.r Par,p+ 
+411 brp,qPar,p + 1139rq,pPqr,p + Q14Yqr,pP priq + G15 Por,pPap.r) dV+ 


+ : (b1 bit Paq + bodbgr Par + b3dbrqVaqr) dV= 
Q 

= [ Lepr Part pdA a i: [g1Mi,iPaq + 82(Ug,r a Ur.g Par ae bit Pqq+ 
Q 


+boty.g Par =F b3Ug,r Lar + d32Lqq “F dW Laq = BOQqq| dV + / PoMgrPqrd V. 
Q 
(7) 
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We choose the space [AGH for the test function and define Qg = 
L? (0, T; LHG(Q)P*”). 

In the following, we consider the variational formulations for the fields that 
describe the porosity change from the Eqs.(1)3 and (1)4, as in [17]. To this end, 
we introduce the following scalar products 


a(y, w) = / (day iw, + a pw) dV, 
: (8) 

b(b,w) = i (dow iwi + anpw) dV. 
Q 


In the expressions above, the functions w(x) and w(x) are sufficiently differentiable. 
As in [17], we denote with the Hilbert space H,(&2) the closure of the set of 
functions w € C!(Q) in the norm generated by the scalar product a(y, w) from (8). 
Likewise, we denote with the Hilbert space H,,({2) the closure of the set of functions 
w € C!(Q) in the norm generated by the scalar product b(2, w) from (8)>. 
Following [17], we recall that for X a Banach space with norm || - ||x, the space 
L?(0, T; X) is the space of functions defined on [0, 7] with values in X that obey the 


condition es If @|dt ‘= \| fllz2@,r:x) < 00. Now we define the functional 


spaces Q,, = L? (0, T; H,(Q)) and Oy, = L? (0, T; Hy(Q)). 
Below we give the variational formulation for 


[ rdwav - [ oiwnidA + [ (dayiw,; + dsa)jw,i) dV+ 
Q dQ Q 


+ f ahenw + daxiw + apw+ agdw—oudv — f ppGwav =o. 
Q Q 
(9) 


We also write the variational formulation for ~ 


/ KopwdV — : 7wnjdA + i. (dspiwi + dowiwi)dV+ 

Q aa Q 

(10) 

+ / (dyE;w + dkjjw + arww + a3yw — y20w) dV — i poLwdV = 0. 
Q Q 


Finally, we write the variational formulation for # with x(x) a sufficiently differ- 
entiable function 


/ [(a+ Bie Tox — BG: ToX + Tox + 2bTox + abTox] dV = 
Q 


ay aixmdd— f kBix.dV + | poSxaV. 
aa Q Q 


As in [9], we choose the space Hj (Q) for the test function and define Qg = 
L? (0, T; Hy(Q)). 


(11) 


184 A. Chirila and M. Marin 


Taking the results above into consideration, the functions u € Q,, 6 € Og, y 
O., ~ € Qy, 8 € Qo are the weak solutions of the dynamical problem of isotropic 
dipolar thermoelasticity with double porosity if the initial conditions (4) hold true 
and the Eqs. (6), (7), (9), (10) and (11) are satisfied for any ¢ € [0, T] and any z € 
[Hp (Q))P, » € [Hg (Q))?, p € Ay(Q), b € Hy(Q) and x € Hy (Q). 

In the following, we focus on the two-dimensional case. The conditions of plane 
strain are considered the same as in [27]. To be more specific, Q is now a plane region 
in R? and x = (x), x2). Moreover, u = (uy, U2), uj =Uj(X1, x2) for j= 1,2, d= 
(O11, G22, G12, G21), bij = Gif (1, X2) fori, j = 1,2, p = pr, x2), YW = VXI, X2) 
and 6 = (x1, x2). 

For example, the variational formulation for y under the conditions of plane strain 
becomes 


/ ki ~wdV +f (dapiw.1 + dayow2 + dsyiwat d51)2W.2) dV— 
Q Q 
= i oiwnd A+ i [ur iw(ds + ds) + ur2wldi +ds)—dsouw— (12) 
aa Q 
—doxw + ayyw + a3yw — y0w)]dV — / poGwdV = 0. 
Q 


Moreover, the variational formulation for ~ under the conditions of plane strain 
becomes 


: KowdV +f (dsp1w1 + dsy2w.2 + dew iw + det) 2w,2) dV— 
Q Q 
: / rund A+ / [ur 1w(de + d) + wr 2w(dr + d) — dow (13) 
aa Q 
—dénw + agdw + agpw — 720w)] dV — poLwdV = 0. 
Q 


Finally, the variational formulation for # under the conditions of plane strain is 


/ [(a + B)(i11 + t2,2) Tox — B(bu + $22) Tox + MPToX + bToX+ 
Q 
(14) 


+atTox]av = [ aixnda— [ (ix + kaxa)av + | poSxdV. 
aa Q Q 


The test functions and the weak solution are chosen in suitable space functions, based 
on the choices corresponding to the three-dimensional case. 
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4 Numerical Simulations 


In Table 3, we summarize the main steps that are required in order to compute a 
numerical solution of the initial boundary value problem of isotropic dipolar ther- 
moelasticity with double porosity, starting from the discretization of the variational 
formulation of the corresponding system of partial differential equations under the 
conditions of plane strain, as derived in the previous section. Note that for the numer- 
ical simulations we use the finite element software FreeFem++ 3.54. 

We will choose a suitable triangulation in order to spatially discretize the plane 
domain. We use the space P11 of piecewise linear continuous finite elements, where 
the degrees of freedom are the vertices values. A detailed account of appropriate 
finite element spaces for the model of dipolar elasticity is given in [27]. 

The values of the material parameters are taken from [5, 27]. We use dimensionless 
variables for the numerical simulations. 

As initial condition, we set u;(0, x, y) = as for (x, y) € Q. The other initial 
conditions are set to zero. 

The backward Euler scheme is used for the discretization of the derivative with 
respect to time, as in [9]. In fact, we will use the variational formulation of the problem 
in terms of the velocity vector u, the speed of the microdeformation tensor d, the 
speed of the porosity due to pores y, the speed of the porosity due to cracks wb and 
the temperature 9. Then, the displacement, the microdeformation and the porosity 
fields are recovered from the relations 


u(t) = i iid: oes : d@ds +e, 
: : (15) 


p(t) = i v(s)\ds+y°, wit)= i w(s)ds eal, 


For the time discretization, we consider a uniform partition of the time interval 
[0, 7] with step size dt = tf and nodes t, = t,_; +dt forn = 1,..., N such that 
O= <t) <---<ty=T. 


Table 3 Steps of the implementation in FreeFem++ 
1. Define the boundary of the domain by using the keyword border 


2. Build the mesh by using the keyword mesh 


3. Choose appropriate finite element spaces by using the keyword fespace 


4. Define the constants 


5. Choose appropriate initial conditions 


6. Write the variational formulations for the displacement vector, the microdeformation tensor, 
the pressure due to pores, the pressure due to cracks and the temperature by using the keyword 
problem 


7. Write a for loop in order to update the value of the physical quantities involved at each time 
step and plot the corresponding solution 
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Fig. 1 Temperature 6 


Fig. 2. Displacement uw, for different values of the dipolar parameters at the same iteration 


Fig. 3 Mesh fora 
rectangular domain with a 
circular hole 


ee 


Sits 


ara: 


Finally, in Fig. 1 we depict the temperature. 

In Fig. 2 we plot the displacement uw for different values of the dipolar parameters. 
To be more precise, for the second plot we multiply the values corresponding to the 
first plot in Fig.2 by 10 and we note that we increase the stiffness of the body by 
increasing the internal length scale. This was also observed in [27]. 

In the following we consider a rectangular domain with a circular hole in the 
middle and depict the mesh in Fig.3 and plot the displacement uw; in Fig. 4. 


5 Discussion and Conclusion 


According to [18], approaches based on extended continuum modelling are necessary 
today, since many innovative engineering materials are characterized by an inherent 
microstructure. 
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Fig. 4 Displacement wu 


In this chapter, we have written the variational formulations for the system of par- 
tial differential equations characterizing the dynamical problem of dipolar isotropic 
thermoelasticity with double porosity. Then we discretized them by applying the 
finite element method for the spatial variable and the implicit Euler scheme for the 
time derivative. In order to compute a numerical solution, we wrote a computer 
program in FreeFem++. 

We focused on the case of plane strain. First of all, we studied a rectangular 
domain and plotted the displacement for different values of the internal length scale. 
Then we plotted the temperature. Second of all, we considered a circular hole in the 
center of the domain and graphically depicted the displacement. 

We can conclude that mechanics of generalized continua is a field of applied 
mathematics in which arise systems of partial differential equations that can be 
discretized by the finite element method. This technique leads to various numerical 
algorithms, depending on the type of partial differential equations involved (either 
parabolic or hyperbolic). 
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A Practical Review on Intrusion ®) 
Detection Systems by Known Data siete 
Mining Methods 


Meisam Samareh Ghasem and Marjan Kuchaki Rafsanjani 


Abstract Computer networks are very functional in different fields so, the security 
of them should be provided. Many methods have been proposed for intrusion detec- 
tion in computer networks. Software methods have been studied more than other 
methods. Machine learning methods have a good performance. In this paper, some 
of classification and clustering methods will be reviewed. Their structure will be 
described, how of use and adjust them and mainly advantages and disadvantages 
will be studied. And in the end, after introducing, described, criticized and checked, 
proceeded to the experiment and compare results of introduced methods with KDD 
cup99 data set. 


Keywords Intrusion detection system - Data mining - Classification - Clustering 


1 Introduction 


Nowadays, computer networks have become an integral part of life. In addition, 
computer networks haven’t limited to computers, for example, smartphones and 
tablets have created new communication networks. Use the computer is increased and 
also attacks have increased, so attention to security is essential. Traditional methods 
have been proposed to prevent attacks on computer networks. These methods do 
not complete to supply the security of computer networks [1-5]. A secure network 
must have three parameters of data (packets) that are transmitted between computers 
[5-7]: 


e Integrity, data integrity specifies that a permissible node can change received data 
and it sends new information to any other authorized node, this property confirms 
that data is preserved from illegal changes in this transfer. 
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e Availability, as is clear from its name these data are available at the required time. 
Each authorized node needs data, this feature ensures data is available. 

e Confidentiality, data confidentiality indicates the range access to data for each 
node, by this feature any node could not access to every transmitted data, but it 
can access to necessary data under especial strategy. 


New approaches have been introduced to provide network security, for example 
data mining methods have been presented and the investigation is still continuing in 
this field. In this paper, some methods to handle the intrusion detection problem are 
reviewed. 

The remainder of this paper is organized as follows: Sect. 2 defines the intrusion 
detection systems (IDS) and data mining methods. In Sect. 3 classification methods 
are examined. Section 4 presents clustering methods. Experiments and results are 
shown in Sect. 5 and finally, we conclude the paper in Sect. 6. 


2 Preliminaries 


2.1 Intrusion Detection Systems (IDSs) 


Intrusion action is every illegal process that (i) access to information without change 
it (eavesdrop), (ii) change and manipulate information or (iii) deliver useless and 
unavailable system [7]. Briefly, every action violates the parameters of a secure 
network [8-10]. Intrusion detection system is a software with detector computing 
functions that detects and reports abnormal and malicious activities to network 
managements [2, 5, 6, 10]. 

IDS uses detector functions which have already been implemented to detect 
attacks, unauthorized behavior, inaccurate or abnormal activities, and it alerts to 
network administrators [11]. There are two types of IDS: the (i) host-based IDS 
(HIDS) and (ii) network-based IDS (NIDS) a HIDS checks and analyzes internal 
computing, but a NIDS acts on a big scale and deals with network data [7, 12]. 


2.1.1 Detection Approaches 


According to recent research, there are three types of detection; (i) anomaly, (ii) 
misuse, (iii) hybrid or compound of the anomaly and misuse. Detection rate (DR) 
is an important criterion to evaluate the performance of IDS methods, which is the 
ratio of detected attacks to total attacks. And also false positive rate (FPR); which is 
the ratio of normal connections as attacks to total normal connections [6, 10, 13]. 


e Anomaly detection, in this aspect a set or database of the normal and authorized 
behavior of the system is specified and every act contravene them can be malicious 
candidate [2, 6, 13]. Advantages of these methods are able to discover new and 
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unknown attacks and these have low false negative rate (FNR), disadvantages of 
these methods are high FPR and low DR for known attacks [7, 8, 10, 14]. 

e Misuse or signature based detection, initially formed a pattern of abnormal 
behavior and it classifies future events based on this pattern that is named signa- 
ture [2, 5, 8, 10]. Advantages of misuse detection are low FPR and high DR and 
disadvantages are low capacity to detect unknown attacks and challenge of update 
the attacks database [5, 6, 10]. 


2.2 Traditional Approaches 


Traditional approaches to prevent attacks, such as firewall, user authentication, 
avoiding error programming, data encryption, access control scheme and virtual 
private network (VPN). Firewalls are the first obstructions and the first layer of 
security where is a defender for sensitive data [15]. 

Firewalls are not sufficient to cover the security of computer networks in advanced 
attacks, because they could not keep open port network service [10, 13, 16]. 

Programming errors are inevitable in big and complex systems, so these probable 
bugs could be start of some attacks. In theory, data encryption is a good method 
to preventing attacks. So if an encryption system is easy, then it will be easily 
broken; and if encryption system is complex, then this will be computational overhead 
10, 13, 15, 17): 


2.3 New Approaches 


New algorithm was introduced to improve IDS software in computer networks 
security. 


e Rule—based expert system, this system uses if—then rules to detect attacks, 
although it improves traditional methods and it has good DR, but this requires 
an expert to extract the rules. Also, when system becomes large and complex its 
performance is worse and these rules should be updated regularly and it is poor 
performance in confronting to new attacks [18]. 

e Statistical methods, usually according to the given data, this method attempts to 
model inputs in statistical distributions, thus be warned any behavior that violates 
models. Statistical methods have high DR for known attacks and they have low 
DR for novel attacks. They need to collect enough data for building a complicated 
mathematical (statistical) model, it is similar to the rule-based methods when data 
set grows DR comes down and they need to update the mathematical model [7]. 

e Data mining techniques, data mining methods extract knowledge from naive data 
(knowledge discovery from data). Recently new researches have started with data 
mining to improve IDS software [8, 19]. 
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3. Classification 


Classification power is almost the best human knowledge that a human learns in his 
life, and data mining methods try to simulate this skill. In these methods a set of 
input data is selected as training data X and each input data has a class label Y. Data 
mining method learns based on training data and target data (class labels) so, these 
methods are called as supervised algorithms. 


3.1 Artificial Neural Networks 


The idea of artificial neural network is obtained from human brain. Another view 
tells ANN is a tool for mathematical computation. This view is similar to the concept 
of the function in math and it is called black box [19, 20]. ANNs are used to 
improve intrusion detection problems, abundant researches are presented in this field 
[2, 7, 10]. 


3.1.1 Multi-layer Perceptron (MLP) 


Multi-layer perceptron is very popular, This is a feed forward ANN and it often 
trains with back propagation (BP) learning algorithm, BP is a member of the gradient 
descent group where weights move opposite direction of gradient [5, 7, 10, 20-22]. 

In this architecture (feed-forward) there are three units; first, input layer, and 
second is hidden layer and third is output layer, briefly of this ANN is shown in 
Eq. (1). This equation specifies one layer of MLP. 


Output = g (: X;W; + ) (1) 


i=1 


In this formula X;’s are inputs, W; and Db are primary weights and biases and g is 
transfer or activation function that is a nonlinear function. Main advantage of MLP 
is easy to implement and use, and it capable to compute every linear or nonlinear 
computation. Main disadvantages are low training speed, sensitive to the features 
and require to rich data set [5, 7, 10, 20, 21]. MLP is used to intrusion detection 
problems [11, 23-25]. 


3.1.2. Radial Based Function Neural Network 


There are issues that are not linearly separable, but they are separated by a circle. 
This is a naive idea for making a power classifier. When classifier is a circle two 
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matters are considered; a center and the distance from this center. These classifiers 
are radial based [20]. Equation (2) is a format of Radial Based Function (RBF). 


f(x) = OX — wD (2) 


In Eq. (2) X is an n—dimensional input, |1 is center and ||.|| is Euclidean distance. 
Architecture of one layer of the RB neural network is shown in Eq. (3). 


Output = g i (O(X;)) W; + ) (3) 


i=1 


These parameters are the same parameters of Eq. (1). RBF requires more neuron 
than MLP and it is high performance when training vectors are big [20]; RBFs are 
used as IDS [11, 24]. 


3.2 Support Vector Machines (SVMs) 


Support vector machines are supervised algorithms that are used for classification 
(support vector classifier-SVC) and regression (support vector regression—SVR), 
these methods are used in problem with two output classes [26]. 

This classifier finds a hyper plane so that, the maximum difference between two 
classes to be created. The best hyper plane creates a maximum margin between two 
classes. In problems that are not linearly separable SVMs map original problem from 
input space to feature space. These SVMs are called kernel trick [19, 27]. 

Advantages of this method: is not sensitive to the number of features, avoid from 
overfitting, escape from local optimum and this learn very accurate and effective when 
data are rich. The disadvantages are: this method is a binary classification, sensitive 
to the number of data, training and testing process is slow and requires powerful hard- 
ware for computation [28]. SVMs have been used for intrusion detection problems 
[7, 10, 29-33]. 


3.3 Decision Trees 


Decision trees are formed nodes and edges, the first node is named root, internal 
nodes are labeled by attributes and the end nodes (leaves) are named by class labels. 
Train process is a top down and recursive procedure, in each step the best attribute 
for best classification selected and removed from set of considered attributes. This 
procedure runs until all attribute selected and used in tree. 

The main advantages of decision trees are: avoid over fitting, handle high dimen- 
sional input data set, create rules and is suitable to numerical and categorical data 
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set. Some disadvantages are: build an optimal decision tree from training data set is 
an NP-complete problem, building the tree is not backtrack process, and decision 
tree learning is an unstable algorithm. These methods are used for intrusion detection 
problems [18, 34]. 


3.4 Lazy Learners 


Lazy learning or instance based learning is a class of algorithms that used for classi- 
fication and regression. This method reserves training data and nothing happen until 
enter test data, so this method delays classification process until predict of test data. 


3.4.1 K-—Nearest—Neighbor 


K—Nearest—Neighbor (K—NN) reserves train data, and test data are classified with 
K-neighbor. This algorithm requires certain criteria to compare neighbors. Euclidean 
distance is a good candidate, see the Eq. (4). 


diy) = | >> Gi -—y)? (4) 


i=1 


With this measure and K amount, test data place in nearest neighbor (Minimum 
distance) that have been saved previously. From main advantages of K—NN can 
be noted to these cases: training time is low and implementation is simple. Main 
disadvantages are: spend too much time in the forecast, require a large space to store 
train data set. Altogether, when the data set is large with little features, then lazy 
learner is useful [26, 35]. K—NN is used as intrusion detection [7, 36]. 


3.5 Bayesian Methods 


There are problems that the relation between train data and class label is nonde- 
terministic. This section introduces a modeling for probabilistic relation between 
train data and class label based on Bayes theorem, see Eq. (5) Bayesian methods are 
probabilistic classification models [26]. 


P(X|Y)P(Y 
P(Y|X) = a (5) 
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For calculating posterior probability P (YIX) as target class with prior probability 
P (Y) and class conditional probability or likelihood P (XIY) performed according 
to Bayes theorem follow Eq. (5). 


3.5.1 Naive Bayesian Classifier 


Consider X as training data and Y is target and R = {v, v2..., vr}. So posterior 
probability P (YIX) is calculated by Eq. (6): 


P(XIY)P(Y) — __ P(XIY) PY) 


P(Y|X)= — = 
VPA yPre) P(X) 


(6) 


Bayes theorem uses maximum a posteriori hypothesis (MAP) to classify new 
example X, thus uses 6 posterior probability P (YIX) and ascriptions new example 
X in Eq. (6) until reaching to the maximum value, so the problem is finding an index 
that maximize P (YIX), it is: 


y = P(X|Y)P(Y) 7 
MAP = arg max = “By (7) 


For maximization Eq. (7), it is enough maximizing the numerator. P (X) is a 
normalized term and it can be estimated. And therefore, sample X is member of 
classes vy, if and only if: 


P(X|Y =v,) > P(X|Y =v;),i=1,2,...,R (8) 


Naive Bayesian classifier is based on this assumption that examples are condi- 
tionally independent, so the Eq. (6) become to: 


P(X|Y) = P(X; =v) x... x P(X =v) =] [ (PI) (9) 


i=1 


Mainly advantages of this method are: simplicity, easy to implement and often 
has a good result. The disadvantage of it is the first assumption that independent 
between data [19, 26, 37]. This method was used to intrusion detection [5, 38]. 


4 Clustering 


Clustering methods classify a problem without class label and solve it just by obser- 
vations, so these are known as unsupervised algorithm. In clustering aim is finding 
a set of data as a cluster that satisfies two conditions: members of a cluster have 
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a maximum similarity to themselves and has maximum difference by members of 
other cluster [26]. 


4.1 Partitioning Methods 


Partition based methods act as clustering methods, it partitions input data until two 
conditions are satisfied. Cluster centers and the number of clusters is determined, 
algorithm starts to work and improves output based on clustering policies. 


4.1.1 K—Means 


K-—means is a simple clustering method. Algorithm receives the number of clusters 
and finds K-means as centroid that satisfies two conditions. This algorithm needs a 
metric to evaluate distance between observations, there exist very methods to assay 
between observations but Euclidean distance is intended. So K-means reduced to 
solve an optimization problem with objective function that is shown in Eq. (10). 


E=)~ Y- dist(x;,¢) (10) 


cy xpPEC; 


In Eq. (10) set of observations is X = (x1, X2, ..., Xn») where every of inputs (x;) are 
d—dimensional, C = {c1, C2, ..., cx }is clusters and k < n. This algorithm initializes 
centers randomly and changes centers iteratively until reaching to the best clustering. 
From some weaknesses of k-means are when the algorithm calculates the average 
of the data (observation) obtained average may be not an actual data, it does not 
guarantee to find the optimal solution, and has high computational complexity (NP- 
hard). Main advantages are simplicity and it used for a wide diversity of data type. 
By this disadvantages still it used in field IDS as clustering or preprocessing, see 
these researches [3, 7, 10, 13, 36, 39]. 


4.2 Hierarchical Methods 


Hierarchical clustering or Hierarchical Cluster Analysis (HCA) is based on tree 
structure creation, unlike partitioning methods this method does not require to number 
of clustering, but it needs a metric to calculate difference between data. This method 
builds a tree structure in top—down or bottom-up format [37]. 
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4.2.1 Agglomerative 


Agglomerative hierarchical clustering or AGNES (agglomerative nesting) is a 
bottom-up clustering method. This algorithm starts with a data set, at beginning 
considers each input as a cluster and continues until all of data are integrated in one 
cluster [37, 40]. Mainly advantage of this method is easy to implement and does not 
need the number of clusters. The disadvantage of this method is lack of backtracking 
action and time complexity of O(n*) required that n is the number of data points. 
Hierarchical clustering are used as IDS [41]. 


4.3 Density-Based Methods 


This method clusters data set based on the number of observations that have been 
distributed in a neighborhood of each observation. A neighborhood characterized by a 
certain diameter in each cluster. Density—based spatial clustering of applications with 
noise (DBSCAN) and ordering points to identify the clustering structure (OPTICS) 
are two density—based methods [26, 37]. 


4.3.1 DBSCAN 


DBSCAN selects a particular point of data set and its status is checked by two 
parameters ¢ and Min-—Pts. ¢ is neighborhood radius to determine neighbors of a data 
point. Min—Pts is the number of data required for form of dense region, this method 
known as center—based approach or density reachability. The main advantages of 
DBSCAN: avoiding from outlier data and capable to describe complex pattern, also 
not required to the number of clusters and the time complexity is O(n logn). The 
main disadvantages of this method are: if data set is created from varying densities 
then algorithm not work very well, determine value of the ¢ and Min—Pts is another 
challenge too [3]. In these researches used this methodology, for more study see them 


[3]. 


4.4 Model-Based Methods 


Model-based methods assume there is a model that generates data set and these 
methods try to approximate this model. 
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4.4.1 SOM 


Kohonen self-organizing maps or self—organizing feature maps (SOFM or SOM) are 
a Class of the competitive artificial neural network. SOM clusters input data based 
on discovered model and acts with three processes: 


(1) Competition, neurons compete in output layer and one of them will win to 
generate output. Compete factor calculated based on specific function like 
Euclidean distance. 

(2) Cooperation, since a neuron (or node) wins, then this node changes neigh- 
borhood center and other nodes in this neighborhood behavior following this 
center. 

(3) Adaptation, other nodes in winner neighborhood node update weights until 
chance of this winner raises for success again. 


The advantage of SOM is learning process that improves its performance by 
competition; and the disadvantages of SOM is determine the number of cluster before 
learning process, lack of specific goal function and lack of guarantee to find the best 
solution. SOM is used as IDS in [5, 7, 10]. 


5 Experiment and Results 


In this section, the results of introduced methods are compared and all experiments 
were performed by MATLAB software. 


5.1 Data Set Analysis 


KDD CUP 99 data set is used to test novel methods. This data set is built based on 
the actual traffic network in DARPA’98. KDD CUP 99 contains two groups, normal 
and attack behavior that attacks include several types; DOS, R2L, U2R and probe 
[42]. 


(1) Normal connections (NRL), it is anormal behavior, such as opening and visiting 
web pages and downloading files. 

(2) Denial of Service (DoS), the attacker tries to show the system is busy so the 
legitimate request does not execute correctly. 

(3) Remote to Local (R2L), the remote attacker tries to penetrate to the victim 
machine and starts to misuse this account to send packets. 

(4) User to Root (U2R), the intruder tries to access to a normal user and then 
becomes the root of the users. 

(5) Probe (PRB), the intruder tries to gain information from target host and finds 
possible vulnerabilities. 
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Table 1 List of attacks DoS RIL U2R PRB 
Back Ftp—write Buffer-overflow | Ipsweep 
Land Guess—passwd | Loadmodule Nmap 
Neptune | Imap Perl Portsweep 
Pod Multihop Rootkit Satan 
Smurf Phf 
Teardrop | Spy 
Warezclient 
warezmaster 


Data set is created with a large number of network connections include normal 
and malicious group. Each connection is demonstrated by 41—dimensional feature 
vector that include 34 numerical features and 7 symbolic features. Table | shows 
more details of attacks [13, 18, 24, 43, 44]. 


5.2. Evaluation Measures 


Evaluation measures are shown in Table 2 where this table known as confusion 
matrix. True negative (TN) and true positive (TP) correspond to true predictate. 
False negative (FN) is: attack behavior is detected as normal wrongly, and false 
positive (FP) is: detection of normal behavior as attack wrongly. Accuracy measure 
is defined by Eq. (11) and it is a multiple measure [7, 10, 26, 37, 45]. 


TP+TN 


Accuracy = 
Y= TPLTN+FP+FN 


(11) 


The last criteria is precision that is shown in Eq. (12). Of course, there are several 
evaluation criteria that are ignored here. 


— TP 
Precision = —————_ (12) 
TP+FP 
Table 2, Confusion matrix Actual Predicted normal Predicted attack 


Normal ™ FP 
Attack FN TP 
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5.3 Results and Analysis 


Experiments include four parts, in the first part the number of inputs are 38 which 
corresponding to the numerical features of data set and output consists two classes 
*intrusion” or “normal”, in the second part output contains five classes that are 
introduced in Sect. 5.1, in the third part number of inputs are 41 according to the 
all features and the output contain intrusion” or *normal” and in the fourth part the 
number of inputs are 41 and output include five classes. MLP is configured based 
on these description and the number of hidden layer neurons are 20 and activation 
functions are sigmoid. 

Table 3 shows the performance of MLP, accuracy and precision values are good. 
The first line of the table belongs to the first part and the second row is related to 
the third part. Similarly the results of the second and the fourth part are shown in 
Table 4. The precision of U2R in the second part in 4 is 100%, but accuracy is not 
very good. So, to compare the result of an algorithm is considered several criteria. 

RBF was the second classifier, the results of RBF are shown in Tables 5 and 6. As 
can be observed RBF results are weaker than MLP (accuracy and precision values 
are lower). The main reason is that RBF acts based on the distance between inputs 
and it sensitive to the number of data [46]. 


Table 3 The results of the first and third experiments 


MLP FN (%) FP (%) Precision (%) Accuracy (%) 
Attack 0.24 0.16 99.84 99.80 
Normal 0.49 0.74 99.26 99.39 
Attack 0.066 0.11 99.89 99.91 
Normal 0.33 0.21 99.79 99.73 


Table 4 The results of the second and fourth experiments 


MLP FN (%) FP (%) Precision (%) Accuracy (%) 
DoS 0.16 0.06 99.94 99.89 
NRL 1.11 1.15 98.85 98.87 
PRB 1.12 1.13 98.68 98.78 
R2L 0.00 1.90 98.14 99.05 
U2R 36.73 0.00 100.0 81.64 
DoS 0.12 0.12 99.88 99.88 
NRL 0.45 0.70 99.30 99.43 
PRB 0.46 0.20 99.80 99.67 
R2L 0.10 1.40 98.62 99.25 
U2R 34.69 4.08 94.12 80.62 
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Table 5 The results of the first and third experiments 
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RBF FN (%) FP (%) Precision (%) Accuracy (%) 
Attack 6.35 0.16 99.83 96.74 
Normal 0.49 19.78 83.42 89.86 

Attack 6.50 0.092 99.90 96.70 
Normal 0.29 20.23 83.13 89.74 
Table 6 The results of the second and fourth experiments 

RBF FN (%) FP (%) Precision (%) Accuracy (%) 
DoS 8.82 0.44 99°52 95.37 

NRL 0.21 29.65 77.09 85.07 

PRB 7.89 2.11 97.76 95.00 

R2L 19.60 0.00 100.0 90.20 

U2R 26.53 0.00 100.0 86.73 

DoS 8.90 0.24 99.74 95.43 

NRL 0.33 31.83 75.79 83.92 

PRB 7.37 0.66 99.29 95.98 

R2L 21.80 0.00 100.0 89.10 

U2R 26.53 0.00 100.0 86.73 


Table 7 The results of the first and third experiments 


SVM 
Attack 


Normal 


FN 
24.34% 


Precision 


100.0% 


Accuracy 


87.83% 


Attack 


24.33% 


100.0% 


87.83% 


Normal 


Table 8 The results of the first and third experiments 


DT FN (%) FP (%) Precision (%) Accuracy (%) 
Attack 0.00 0.00 100.0 100.0 
Normal 0.00 0.00 100.0 100.0 
Attack 0.00 0.00 100.0 100.0 
Normal 0.00 0.00 100.0 100.0 


The results of SVM are shown in Table 7, SVM is a binary classifier so, there 
are the first and third experiments. The results of decision tree are shown in Table 8 
and its performance is good, because a DT creates some rules based on input dataset 
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Table 9 The results of the second and fourth experiments 


DT FN (%) FP (%) Precision (%) Accuracy (%) 
DoS 0.02 0.00 100.0 99.99 
NRL 0.12 0.29 99.71 99.80 
PRB 0.13 0.26 99.74 99.80 
R2L 0.30 0.00 100.0 99.85 
U2R 6.12 2.04 97.87 95.92 
DoS 0.00 0.00 100.0 100.0 
NRL 0.12 0.00 100.0 99.94 
PRB 0.00 0.00 100.0 100.0 
R2L 0.00 0.00 100.0 100.0 
U2R 0.00 6.12 98.89 96.94 


Table 10 The results of the first and third experiments 


KNN FN (%) FP (%) Precision (%) Accuracy (%) 
Attack 0.66 0.17 99.83 99.58 
Normal 0.17 0.66 99.34 99.58 
Attack 0.57 0.13 99.87 99.65 
Normal 0.13 0.57 99.43 99.65 


and classification process is according these rules. The result of second and fourth 
experiment is shown in Table 9. 

KNN algorithm is introduced as lazy learner. The results of first and third exper- 
iments are shown in Table 10 and they are good. The number of neighbors is five, 
number of neighbors is a free parameter that should be initialized. The results of 
second and fourth experiments are shown in Table 11 and they are good, except U2R 
attack, because the number of samples U2R is lower than other attacks. For this 
reason the U2R attack classification is wrong. 

The results of first and third experiments of Naive Bayes Classifier (NBC) are 
shown in Table 12. Precision and accuracy values of attacks are better than normal, 
perhaps the reason is that the number of samples attack are more than normal. Also 
the results of second and third experiments are represented in Table 13. As the results 
of the NBC show its performance is not very good because these features are not 
truly independent. 

K-means is tested and the results of first and third part is represented in Table 14. 
The results of the third experiment is better than the first experiment, because more 
features have been added to the test. The second and fourth is depicted in Table 15, as 
the table shows the results are average and when the number of samples is reduced the 
clustering error increases. The results of agglomerative are presented in Tables 16 
and 17. As expected, these results are better than K—means because this method 
does not need to determine the centroids of clusters and also K—means method does 
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Table 11 The results of the second andfourth experiments 
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KNN FN (%) FP (%) Precision (%) Accuracy (%) 
DoS 0.08 0.62 99.38 99.65 
NRL 0.17 0.66 99.34 99.58 
PRB 0.38 0.40 99.60 99.61 
R2L 0.033 0.80 99.21 99.59 
U2R 0.13 10.0 90.90 94.94 
DoS 0.06 0.42 99.58 99.76 
NRL 0.13 0.57 99.43 99.65 
PRB 0.25 0.27 99.73 99.74 
R2L 0.011 0.60 99.40 99.70 
U2R 0.14 10.26 90.68 94.80 


Table 12 The results of the first and third experiments 


NBC FN (%) FP (%) Precision (%) Accuracy (%) 
Attack 7.89 0.90 99.03 95.61 
Normal 2.80 24.55 79.84 86.32 
Attack 7.08 1.33 98.59 95.80 
Normal 4.15 22.04 81.74 88.31 
Table 13 The results of the second and fourth experiments 
NBC FN (%) FP Precision Accuracy 
DoS 2.02 20.26% 82.87% 97.86% 
NRL 20.27 88.84% 47.42% 45.67% 
PRB 100.0 0.00% - 50.00% 
R2L 100.0 0.00% - 50.00% 
U2R 100.0 0.00% - 50.00% 
DoS 0.58 0.46% 99.54% 99.48% 
NRL 4.15 1.03% 98.94% 97.41% 
PRB 4.87 2.24% 97.70% 96.45% 
R2L 2.20 2.40% 97.60% 97.70% 
U2R 26.53 - = = 


Table 14 The results of the first and third experiments 
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FN (%) 


FP (%) 
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75.68 


100.0 
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Table 15 The results of the second and fourth experiments 


K-means FN (%) FP (%) Precision (%) Accuracy (%) 
DoS 0.05 81.17 55.21 59.21 
NRL 0.47 98.67 50.22 50.33 
PRB 12.02 99.94 46.82 44.02 
R2L 24.32 100.0 43.08 37.84 
U2R 50.29 100.0 33.20 24.85 
DoS 0.05 81.07 55.21 59.44 
NRL 0.47 98.67 50.22 50.33 
PRB 10.00 100.0 47.37 45.00 
R2L 29.27 100.0 41.43 35.37 
U2R 50.29 100.0 33.23 24.85 


Table 16 The results of the first and third experiments 


Agglomerative EN (%) FP (%) Precision (%) Accuracy (%) 
Attack 0.013 32.13 75.68 62.16 
Normal 100.0 0.041 0.00 49.98 

Table 17 The results of the second and fourth experiments 

Agglomerative FN (%) FP (%) Precision (%) Accuracy (%) 
DoS 94.10 0.00 100.0 52.95 

NRL 0.00 98.54 50.37 50.73 

PRB 99.51 0.041 92.28 50.22 

R2L 100.0 0.065 0.00 49.96 

U2R 100.0 2.04 0.00 48.98 

Table 18 The result of DBSCAN 

DBSCAN FEN (%) FP Precision Accuracy 
Attack 36.36 32.13% 66.45% 65.75% 
Normal 100.0 - - - 


not guarantee to find the optimal solution. DBSCAN results are shown in Table 18. 
DBSCAN method is different from another clustering algorithm that can determine 
the number of clusters so its results are different from the rest. As the results show this 
method is only able to cluster the attacks and also this cluster is not good performance 
in the first experiment (about 60%). With adding features this performance improves. 

The results of the first and third experiments is in Table 19. In the second experi- 
ment this method has a good performance, and the results of the second and fourth 
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Table 19 The results of the second and fourth experiments 
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SOM FN (%) FP Precision Accuracy 
Attack 99.99 0.00% 100.0% 50.01% 
Normal 0.00 - - - 
Table 20 The results of the second and fourth experiments 
SOM FN (%) FP Precision Accuracy 
DoS 100.0 0.002% 0.00% 49.99% 
NRL 100.0 0.00% - 50.00% 
PRB 100.0 3.88% 0.00% 48.06% 
R2L 99.90 - - - 
U2R 0.00 - - - 


experiments are in Table 20. All results emphasize that any methods alone not be 
sufficient to design a good system for detection each intrusion. Some classifica- 
tion methods have better performance, nevertheless seem the combination of these 
methods is a good idea. Especially mixing of classification and clustering methods 
will be better output and this idea is used, for more study see these researches [39, 
47-51]. 


6 Conclusions 


As the security of computer networks is one of the real challenges that should be 
provided effectively. This paper reviews the machine learning methods for improving 
performance of IDS. Classification and clustering algorithms have been used and have 
been tested by KDD data set. ANNs have better performance than other methods 
that so they can use alone, but seems that a combination of several methods is more 
reliable. For example, clustering methods act as preprocessing and classification 
methods act as IDS. 
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Leaf Recognition Based on Image ®) 
Processing ore 


Hashem Bagherinezhad and Marjan Kuchaki Rafsanjani 


Abstract Plants are one of the most important and most usable of resources for 
human, which are closely related to human life. This close relationship creates the 
necessity to distinguish plant species. First, the action of recognition is done by the 
expert humans, which has limitations and problems. With the rapid development of 
the image processing and the pattern recognition technology, the possibility of the 
automatic detection of plant species on the basis of the image processing is created. 
In recent years, this issue has been considered by many researchers and the many 
ways is presented. In this paper, we first evaluate some proposed ways in the recent 
years. Secondly, we introduce 31 features that are related to the texture, shape, and 
color of the leaves and we evaluate each of these features using the neural network. 
In the following, we introduce three categories for the proposed methods and also, 
we put each of these methods in one of these categories. Finally, we present the 
problems and limitations in this field as the future work guidelines. 


Keywords Leaf recognition - Leaf shape - Leaf texture - Leaf color - Neural 
network 


1 Introduction 


Plants are one of the most important and most usable of the resources for the humans 
on earth, which use them in different fields such as botany and therapeutics, thus 
the first need in this field will be the need to recognize and distinguish between 
different species of plants which in many cases there are so many similarities between 
them. It is referred to as the leaf recognition system [1]. The need for this system 
has many reasons, for example, if in the action of the recognition, a toxic plant 
species is recognized as non-toxic then it will result in irreparable consequences. So 
far, the response to this need has been done by botanists and experts in this field. 
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Fig. 1 General framework for plant species recognition systems 


They examined the plant manually and according to their experiences and abilities 
recognized it. This is not the right method because one person cannot remember all 
the species in the world. Also, the manual method is time-consuming. Therefore, the 
researchers have tried to simulate the performance and the recognition of the botanist 
by the artificial intelligence algorithms and improve the mentioned disadvantages. 
The done research has shown that for determining the type of plant, the study of the 
leaf features is the sufficient and the necessary [2]. 

The general framework for leaf recognition systems is shown in Fig. 1. There 
are 3 main steps in this system for recognition of the plant species: the image 
pre-processing, the feature extraction, and classification [3]. Before extracting the 
features and in the pre-processing step, we perform pre-processing on the image of 
the leaf which this step can include the following: the image noise removal, the image 
segmentation, etc.[3]. After the pre-processing step, the feature extraction operation 
is performed. The extractable features of the image are divided into three categories 
which are expressed in Sect. 2: the leaf shape features, the leaf texture features, the 
leaf color features [3]. In the classification step, all extracted features are character- 
ized as a vector of the features and these feature vectors are classified by the classifier 
and the result of the recognition is obtained [3]. 

In this article, we describe the list of the presented methods in recent years and 
categorize the available methods in these years based on the feature extraction step. 
We also introduce 31 features that describe the leaf image and evaluate them using the 
neural network. In the following, we compare the methods based on their recognition 
accuracy and express their advantages and disadvantages. The structure of the paper 
is as follows. In Sect. 2, we introduce the extractable features. In Sect. 3, we present 
the results of evaluating these features. Section 4 lists the presented methods in recent 
years and compares them and presents their advantages and disadvantages. In the 
end, Sect. 5 summarizes the results of this article. 


2 Extractable Features 


As mentioned in the previous section, the extractable features of the leaf image are 
divided into three categories: the leaf shape features, the leaf texture features, the 
leaf color features. In this section, we introduce the features related to each category. 
First, we have listed the common variables that appear in the formulas related to 
these features in Table 1. 
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Table 1 Some common variables [3] 


Variable | Meaning of the variable 


The width of the exterior rectangle 


The length of the target region that is the minimum enclosing rectangle length of the 
target region 


The perimeter of the target region, counting the number of pixels consisting the target 
image boundary 


The area of the target region which represents the number of pixels in the target image 


The centroid coordinate of the target region 


The probability of the gray element i and j appear respectively,i, 7 =0,1,...,G—1 


The number of gray levels 


i=0,1,...,G—1 The corresponding histogram, 


Zi The gray level of random variables 
M The length of an image 

N The width of an image 

u Mean of color 


2.1 Leaf Shape Features 


The leaf shape is one of the most important of the leaf descriptors which by defining 
the proper features for it, the action of the recognition is performed correctly. The 
extraction of the leaf shape features is based on the image segmentation. After the 
image is segmented, the image contains the borders pixels and the regional pixels. 
Thus, the shape features are divided into two categories: the border-based features 
and regions-specific features [4]. The leaf shape features are expressed as follows 


[3]. 


2.1.1 Aspect Ratio 


The aspect ratio of the image represents the ratio of width to length of the external 
rectangle of the leaf contour. The aspect ratio is defined as: 


Ra, = W/L (1) 


2.1.2 Circularity 
Circularity is defined as: 


C = P?/4xA (2) 
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A is the area and P is the perimeter which respectively represent the number of 


pixels in the target area and the number of pixels in the target area boundary and are 
defined as follows: 


A= eG 7) (3) 
P= N. + J2N, (4) 


where / (i, j) is pixel images, N, is the even-numbered chain code number and N, 
is the odd-numbered chain code number. 
2.1.3 Area Convexity 


The area convexity is the ratio between the area of the target and the area of its convex 
hull and is defined as follows: 


Raconv = A/Ac (5) 


Ac is the area of its convex hull. 


2.1.4 Perimeter Convexity 


The perimeter convexity is the ratio between the perimeter of the target and the 
perimeter of its convex hull and its equation is as follows: 


Recony = P/Pc (6) 


Pc is the perimeter of its convex hull. 


2.1.5  Rectangularity 


This feature shows the similarity between the target region and the its rectangle which 
is defined as follows: 


R=A/Ar (7) 


Ap is its rectangular area (the minimum enclosing rectangle of the target region). 
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2.1.6 Compactness 
Compactness is defined as: 


Reom = P?/A (8) 


2.1.7. Hu’s Invariant Moment 


This feature describes the outline of the target region and is defined as formulas 
(9)- (20): 


$1 = 120 + No2 (9) 

$2 = (N20 — mor)? + 4m (10) 

G3 = (ns — 312)” + (3121 — M03)” (1) 
4 = (30 + m2)? + (nai + 103)” (12) 


5 = (N30 — 3N21) (N30 + N21) (N30 + m2)? — 321 + 103)7] 
+ (3n21 — 30)(n21 + 730)[3(n30 + M12)” — (nar + 103)7] (13) 


Yo = (noo + no2)E(n30 + 12)" — (a1 + 03) ] 
+ 411 (730 + N21) (N21 + 03) (14) 


¢7 = (321 — N03) (N30 + n21)L(n30 + M12)” — 3(n21 + 103)" ] 
+ (3n21 — 130) (121 + 130)[3 (N30 + m12)° — (nor + 103)7] (15) 


gs = 2{mul(n30 + m2)? — (ar + nos)” | — (n20 — N02) (M30 — 112) (N21 — Nos) 
(16) 


go = [(730 — 3712) (730 + 12) + B21 — No3)(21 + No3)] x (720 — N02) 
+ 2ni0(3n21 — 130) (n30 + m2) — (N30 — 3712) (730 + 12) (17) 


10 = [ (321 — 103) (730 + M12) — (730 — 312) (M21 + N03) ] x (720 — No2) 
— 2niiL(30 + 12)(N30 + M12) + 321 — No3)(N30 + 112)] (18) 
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gir = (nos + nao — O22)” + 16(n31 — m3)” (19) 


12 = (noa + nao — 622)[ (M20 — N02)” — 4ni17| + L6ni1(n31 + 113) (M20 + N02) 


(20) 
And the normalized central moment is: 
Npq = Upq /Uoo” (21) 
where the (p + q)— th moments of the image f(x, y) is: 
me =o 
Ung = YY (x x) (— Y) fy) (22) 
x oy 
and y is: 
+ 
y= St (ptq=2.3,-+) (23) 


2.1.8  Irregularity 


The irregularity refers to the radius of the inscribed circle of the target area and the 
circumcircle radius of the target area, as follows: 


Rs =r/R (24) 


where r is the radius of the inscribed circle and R is the radius of the circumcircle 
circle. 

2.1.9 Eccentricity 

The formula for this feature is as follows: 


e= Rmax/ Rmin (25) 


Rmax is the maximum distance from the center to the boundary of the target area and 
Rmin is the minimum distance from the center of to the boundary of the target area. 
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2.1.10 Length/Perimeter 
The formula for this feature is as follows: 


Rrp = L/P (26) 


2.1.11 Narrow Factor 
The narrow factor feature is defined as follows: 
Ryr =D/Lp (27) 


where D is the longest distance between the two points of the target area and L p 
is the long axis length of the target area. 


2.1.12 Perimeter Ratio of Physiological Length and Width 


This feature is expressed as: 


Rprp =P/L+W (28) 
2.1.13 Zernike Moment 
Zernike Moment is defined as: 
p+ P 
Z nq = — tT: y) + Vpg* (x, y) (29) 


where V* is a complex conjugate of V, 


Vig (Xs ¥) = Upg (rc0s0, rsind) = Rpg (r)e'”? (30) 
(p-lq\)/2 (p —s)! 
he ys a p=lq' p+lql ae oe 
s=0 st a s)( > s)! 


where n = 0,1,2,...,7 = /—1, p — |q| is the even, r is the radius, 0 is the angle 
between r and the x-axis. And 
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Vg (¥, *) = Vog(0, 9) = Vng(p) - €'%” (32) 


where p > 0 and p — |q| is an odd number which |q| < p. 


2.1.14 Generic Fourier Descriptor 


The Generic Fourier Descriptor (GFD or GFT) is defined as follows: 


én = {[Proo PFOA)  PFOn) — PF(m0) ren) 
* | 2nr2 “PF(0.0)’ °PF(0.0)’ °PF(0.0)° ° PF(0.0) 
(33) 
where PF is as follows: 
PF(p.9) = D1, f(0.8)- eet Fe) (34) 


where 0 <r < R.O; =iQna/T)0 <i <T).0<p < R.0 <@ <T and Ris the 
radial frequency resolution and T is the angular frequency resolution, 


— | x) ee (>- °) (35) 


(36) 


6= arene 
x—-X 


2.1.15 Elliptical Eccentricity 


This feature refers to the eccentricity of the ellipse of which has the same standard 
second central moment of the target region and is expressed as follows: 


cee (37) 


where c is the ellipse half focal length and a is semi-major axis of the ellipse. 
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2.1.16 Center Distance Sequence (CDS) 
This feature is presented as follows: 

CDS = {D(x%;j.y;)|0 <i < P—- 1} (38) 


where D(x.y) is as follows: 


2 = 2, 
Dixy) = |(- 5) + (»- °) (39) 


2.2 Leaf Texture Features 


The leaf texture consists of several woven elements that are periodic. The image 
texture introduces the features of smooth, spars, and so on. There are three methods 
for describing the texture of an image: the statistical method, the structural method 
and the frequency spectrum method. To extract the leaf texture features, we first 
pre-process the input image and convert it into the grayscale image and then extract 
the appropriate features [4]. The features of the leaf texture are expressed as follows 


[3]. 


2.2.1 Energy 
This feature reflects the degree of uniformity of the grayscale image and is defined 
as follows: 


G-1G-1 


En= > > P?(i, j) (40) 


i=0 j=0 


2.2.2 Entropy 


Entropy represents the complexity or non-uniformity of the image texture, which is 
presented as follows: 


G-1G-1 


Ent =—)~ 5° P*(i, jylog, PG, j) (41) 


i=0 j=0 
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2.2.3. Contrast 


The contrast indicates the resolution of the image and its formula is as follows: 
G G 


Con = — ye ye Gi — jPWi.j) (42) 


i=0 j=0 


2.2.4 Correlation 


This feature is used to measure the similarity of the elements of a co-occurrence 
matrix element to the row or column, and is expressed as follows: 


ee Noa PG f) — mm 


Cor = Flap (43) 
where: 
G-1 
ui = > iP(i, j) (44) 
i=0 
G-1 
w= >> jPUi, j) (45) 
j=0 
G-1 G-1 
o2 =) G-m) > Pw S| (46) 
i=0 j=0 
G-1 G-1 
oY = G-my > Pa) (47) 
j=0 i=0 


2.2.5 Average 


Average is a measure of the average luminance of the texture, and it is defined by 


G-1 
w= >> zP) (48) 
i=0 
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2.2.6 Inverse Difference Moment 


This feature is also known as homogeneity and is defined as follows: 


re 
iis )s ye (49) 


2.2.7. Standard Deviation and Variance 


Standard deviation is a measure for the average texture contrast which is as follows: 


0 = v H2(z) (50) 


where: 


G-1 
Hn = >_ (zi — w)" PC) (51) 
i=0 


The standard variance is also expressed as follows: 


G-1G-1 


Variance = )* ) > (i — uw) P.J) (52) 


i=0 j=0 


2.2.8 Maximum Probability 


This feature is the strongest response to the gray level co-occurrence matrix measure 
and is defined as follows: 


Pmax = max[P(i, j)] (53) 
LJ 


2.2.9 Uniformity 


The uniformity formula is as follows: 


G1 
U =) PP) (54) 
i=0 
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2.2.10 Entropy Sequence (EnS) 
This feature was introduced by Ma et al. [5], which is as follows: 
EnS = —P,[n]log, Pi[n] — Po[n]log, Po[n] (55) 

which P;[n] and Po[n] represent the probability when Y;;[—n] = 1 and Y;;[n] = 
O(at output Y[7]). 
2.2.11 Histogram of Oriented Gradient (HOG) 
This feature is the object detection feature descriptor in machine vision and image 
processing which shows the gradient direction histogram feature of the local area 


of the image. Usually, this feature is used with the SVM classifier for the image 
recognition. 


2.3 Leaf Color Features 


The third category of leaf features is the leaf color features, which are the most stable 
and the most reliable of the leaf visual features. Compared to the leaf shape features, 
these features have better performance in terms of the strength. Also, these features 
are not sensitive to the size and the direction of the objects in the image [4]. The leaf 
color features are a set of the simple features, but the effective in the leaf recognition. 
The leaf color features are expressed as follows [3]. 


2.3.1 Mean of Color 


Mean is a most common feature of the data. For color of images, mean can be used 
for describing the average of color, it is defined as: 


1 M WN 
“= ay as P(i, j) (56) 


2.3.2 Standard Deviation of Color 


The standard deviation of color is described as follows: 
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l M WN 
o= | >> PG)-vP (57) 


2.3.3 Skewness of Color 


This feature describes the shape information about the color distribution and is a 
measure of the color symmetry. The skewness of color is defined as: 


_ Di Da PG DvP 


6 
MNo?3 


(58) 


where o is the standard deviation of color. 


2.3.4 Kurtosis of Color 


Kurtosis of color describes the shape distribution of information about color, and it 
is a measure of the extent of a normal distribution with respect to sharpen or flat. It 
is defined as 


vas eGo=er 
7 MNo4 


(59) 


3 Features Evaluation 


In the previous section, features related to leaf shape, leaf texture and leaf color were 
introduced. In order to provide a better understanding of these features, we evaluate 
them. To evaluate these features, we have used a series of experiments that performed 
on the Leafsnap dataset [6] and Flavia dataset [7]. Also in these experiments, we use 
a back-propagation feed-forward multi-layer perceptron neural network that has two 
hidden layers and an output layer. The number of neurons in each hidden layer is 20 
and uses the sigmoid transfer function. In the output layer, the number of neurons 
is equal to the number of plant species in the dataset and uses the sigmoid transfer 
function. The input neurons have the linear transfer function. Tables 2 and 3 show 
the accuracy of each feature. In order to provide a more effective understanding, the 
accuracy of each feature is presented in Figs. 2 and 3. 
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Table 2. The accuracy of each feature (Leafsanp Dataset) 


Number | Feature Accuracy (%) | Number | Feature Accuracy (%) 
1 Aspect ratio 34.216 17 Energy 12.872 
2 Circularity 2.062 18 Entropy 8.163 
3 Area convexity 31.099 19 Contrast 5.879 
4 Perimeter convexity | 13.216 20 Correlation 8.159 
5 Rectangularity 24.369 21 Average 10.743 
6 Compactness 22.385 22 Inverse Difference | 10.864 

Moment 
7 Hu’s Invariant 46.651 23 Standard deviation | 10.545 
Moment and variance 
8 Irregularity 31.909 24 Maximum 0.815 
probability 
9 Eccentricity 36.304 25 Uniformity 3.446 

10 Length/Perimeter 20.645 26 Entropy sequence | 21.874 

11 Narrow factor 25.811 27 Hog 39.459 

12 Perimeter ratio of 26.125 28 Mean of color 15.047 

physiological length 
and width 

13 Zernike Moment 46.319 29 Standard deviation | 11.067 

of color 

14 Generic fourier 9.687 30 Skewness of color | 19.198 

descriptor 

15 Elliptical 25.454 31 Kurtosis of color | 20.161 

eccentricity 

16 Center distance 16.327 

sequence 


4 Methods Classification 


The leaf recognition can be based on various features of the image. In many cases, 
the botanist focuses his recognition on the leaf texture surface and the details of the 
veins and in some cases, the teethes and the shape of leaf the outer edges play an 
important role in the final result. Other methods have been introduced by researchers 
which have been able to achieve good results in the recognition based on the leaf 
general profile such as the holes or the leaf shape. 

According to a review of the works in this field, a classification can be done 
which includes 3 categories and covers all the research. These categories include the 
methods based on leaf shape, the methods based on leaf texture and methods based 
on leaf color. In this section, in each of these categories, we summarize a number of 
the done researches in recent years. 
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Number | Feature Accuracy (%) | Number | Feature Accuracy (%) 
Number 
1 Aspect ratio 21.638 17 Energy 39.541 
2 Circularity 21.717 18 Entropy 29.887 
3 Area convexity 18.179 19 Contrast 28.371 
4 Perimeter convexity | 42.076 20 Correlation 16.607 
5 Rectangularity 22.655 21 Average 34.386 
6 Compactness 30.296 22 Inverse Difference | 22.569 
Moment 
7 Hu’s Invariant 37.245 23 Standard deviation | 40.433 
Moment and variance 
8 Irregularity 4.659 24 Maximum 35.994 
probability 
9 Eccentricity 17.113 25 Uniformity 23.286 
10 Length/Perimeter 36.239 26 Entropy sequence | 16.874 
ll Narrow factor 32.467 27 Hog 35.981 
12 Perimeter ratio of 19.676 28 Mean of color 41.728 
physiological length 
and width 
13 Zernike Moment 72.091 29 Standard Deviation | 46.918 
of color 
14 Generic fourier 29.485 30 Skewness of color | 26.688 
descriptor 
15 Elliptical 26.887 31 Kurtosis of color | 27.612 
eccentricity 
16 Center distance 12.982 s 
sequence 
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Fig. 2. The accuracy of each feature (Leafsnap Dataset) 
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Fig. 3. The accuracy of each feature (Flavia Dataset) 


4.1 Methods Based on Leaf Shape 


Due to the importance of the leaf shape, in many researches has been attempted 
to segment the image first and then extract the appropriate features. In [8], a new 
method including segmentation, feature extraction and classification method for the 
plant species recognition is presented. In this research, the geometric and leaf shape 
features are used and the Linear Discriminant Classifier (LDC) is used to perform the 
classification. This method is evaluated on Flavia dataset and Leafsanpdataset which 
yielded accuracy of 93% and 71% respectively. In another research that uses the leaf 
shape features for the plant species recognition can point to [9]. In this research, 
the geometric and leaf shape features were considered. In this method, four essential 
features of the leaves were extracted which are the leaf length, the leaf width, the leaf 
area and the leaf diameter. Also, the classification is performed using the K-Nearest 
Neighbor (KNN) algorithm. The evaluation of this system has been carried out on 
32 plant species which the results of this assessment shows accuracy of 93.17% 
for this method. Another presented method in this category is the presented method 
in [10]. In this study, 9 features of the set of features related to the leaf shape are 
used. In this method, for classification of the plant species used moving median center 
(MMC). The evaluation of this system on the collected dataset by the authors yielded 
accuracy of 91%. In addition, this method is evaluated on the Leafsanp dataset which 
has accuracy of 52%. Other methods that have used the leaf shape features are listed 
in Table 4. 


4.2 Methods Based on Leaf Texture 


One of the methods which has used this category of the features for the plant species 
recognition is the presented method in [25]. This method is based on the highlighting 
the leaf texture features. In this method, the combined classifier is used for classi- 
fication. This classifier is a combination of Radial Basis Function (RBF) classifiers 
and Learning Vector Quantization (LVQ). The obtained results from the evaluation 
of this system indicate accuracy of 98.7%. Another method in this category is the 
presented method in [3]. In this research, the texture features have been used along 
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Table 4 Comparison of methods in the field of leaf recognition 
Methods Year | Used features Classification | Dataset | Accuracy 
Leaf [Leaf [Leaf | method v) 
color texture | shape 
features | features | features 
Sekeroglu and | 2016 - * NN Dataset | 97.2 
Inan [11] with 27 
plant 
species 
(For each 
plant, 4 
images) 
Kalyoncu and | 2015 * LDC Leafsnap | 71 
Toygar [8] 
Kalyoncu and | 2015 LDC Flavia 93 
Toygar [8] 
Zhu et al. [1] 2015 * SVM Leaf - 99.5 
scan [12] 
Zhu et al. [1] 2015 = SVM Leaf [13] | 52.9 
Zhu et al. [1] 2015 * SVM Flower 60 
[14] 
Zhu et al. [1] 2015 . SVM Fruit [15] | 57.1 
Wang et al. [16] | 2015 = . PCNN Flavia 96.67 
Wang et al. [16] | 2015 = * PCNN ICL 91.56 
Wang et al. [16] | 2015 ss = PCNN NEW 91.2 
2012 
Q uadri and 2015 | * = SVM Part of 100 
Sirshar [17] the Flavia 
Dataset 
Chaki et al. [3] | 2015 - * MLP Dataset | 97.6 
with 31 
plant 
species 
Chaki et al. [3] | 2015 = * NFC Dataset | 85.6 
with 31 
plant 
species 
VijayaLakshmi | 2015 | * * a Combination | Dataset | 99.87 
and Mohan [18] of PSO and with 60 
FRVM plant 
species 
Nidheesh et al. | 2015 7 K-NN Dataset | 93.17 
[9] with 60 
plant 
species 
and 1800 
images 


(continued) 
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Table 4 (continued) 


Ghasab et al. 2015 i = SVM Dataset 95.55 
[19] from 

several 

plant 

species 

from 

several 

families 

of trees 

(citrus, 

etc.) 
Ghasab et al. 2015 . . SVM Flavia 96.25 
[19] 
Arunpriya and | 2014 . o RBF Dataset | 88.6 
Thanamani [2] with 5 

plant 

species 
Arunpriya and | 2014 7 i ANFIS Dataset | 91.4 
Thanamani [2] with 5 

plant 

species 
Amlekar et al. | 2014 = * ANN Dataset 91 
[20] with 221 

images 
Narayan and 2014 | * * SVM Dataset | 88 
Subbarayan with 50 
[21] images 
Hu et al. [22] 2012 i MDM ICL 91.33 
Hu et al. [22] 2012 7 MDM Swedish | 98.2 

Leaf [23] 
Hu et al. [22] 2012 re MDM Leafsnap | 69 
Husin et al. [24] | 2012 = . NN Dataset 98.9 

with 20 

plant 

species 
Rashad et al. 2011 = Combination | Dataset 98.7 
[25] of LVQ and with 60 

RBF images 

Kadir et al. [26] | 2011 | * * = PNN Dataset | 93.75 

with 32 

plant 

species 
Du [10] 2007 * MMC Dataset | 91 

with 10 

plant 

species 
Du [10] 2007 = MMC Leafsnap | 52 
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with the leaf shape features. In this method, two types of neural networks are used for 
classification which are a Neuro-Fuzzy Classifier (NFC) and Multi-Layer Perceptron 
(MLP). Then, the system is evaluated on a dataset with 930 images which includes 
31 plant species. The evaluation results show accuracy of 97.6% for NFC and accu- 
racy of 85.6% for MLP. Another method that uses these features is the presented 
method in [16]. In this method, in addition to the leaf texture features, the leaf shape 
features are used for the plant species recognition. This study has paid attention 5 
features from these two categories and in addition, the Pulse coupled neural networks 
(PCNN) classifier has been used. To evaluate this technique have been used Flavi- 
adataset, ICL dataset [27] and MEW2012 dataset [28] which results in accuracy of 
96.67% for the Flavia dataset, accuracy of 91.56% for the ICL dataset and accuracy 
of 91.2% for the MEW2012 dataset. Other methods that have used the leaf texture 
features are listed in Table 4. 


4.3 Methods Based on Leaf Color 


One of the methods which has used this category of the features for the plant species 
recognition is the presented method in [21]. In this study, the leaf color features 
were used along with the leaf shape features which is a total of 26 features. In this 
method used Support Vector Machine (SVM) for classification. The assessment of 
this method has been carried out on a dataset with 50 plant species which 70% were 
used for training and 30% for the test. The result of the evaluation shows accuracy of 
88%. Another method which is placed in these methods is the presented method in 
[26]. In this study, 5 types of the features have been extracted from the leaf surface. 
This method has been used Probabilistic Neural Network (PNN) for classification. 
The evaluation of this method has been performed on a dataset with 32 leaf species 
which results in accuracy of 93.75%. Another method that has used this feature is 
the presented method in [18]. In this method, in addition to the leaf color features, 
the leaf texture features are used for the plant species recognition. In addition, in this 
method, a combination of Particle Swarm Optimization (PSO) algorithm and Fuzzy 
Support Vector Machine (FSVM) have been used to perform the classification. The 
method has been evaluated on a dataset consisting of 60 plant species which each 
plant species includes 50 images for training and 100 images for testing. The results 
of the evaluation on this dataset resulted in accuracy of 99.87%. Other methods that 
have used the leaf color features are listed in Table 4. 


4.4 Methods Summarization 


As described in the beginning of this section, the proposed methods for the plant 
species recognition can be grouped into three groups. Table 4 lists a large number 
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of the presented methods in recent years. Each of these methods has advantages and 
disadvantages. 

Most of the presented methods in recent years have the problem of incompleteness. 
These methods are evaluated on datasets that cover a limited variety of the plant 
species. Including these methods, we can mention the methods presented in [3, 10, 
25, 21-24]. Also, other methods are evaluated on the standard datasets that cover the 
low number of the plant species and there is not a lot of similarity between them. 
Some methods, including the presented methods in [1, 8, 10, 21, 22] are evaluated 
on datasets with a large number of species which there is a lot of similarity between 
them. These methods have a fairly low accuracy. This low accuracy is due to the 
appearance similarity of these plant species and the use of the leaf shape features 
alone. Thus, the accuracy of these methods is one of their greatest disadvantages. 
Including the other disadvantages of the proposed methods can be pointed to the time 
complexity of some of the methods proposed. For example, in methods that consider 
the time complexity as an important parameter can refer to the presented examples 
in [8, 10]. These methods have a relatively high time complexity. One of the most 
important reasons for the high time complexity of these methods is the extraction of 
the features that are inefficient. 


5 Conclusion 


The plant species recognition is essential. This work was initially carried out by the 
qualified individuals, but has some limitations. The leaf recognition is simplified 
by using image processing. In this article, we have reviewed some of the methods 
that have existed in the past few years. There are close to 31 features for the plant 
species recognition and many classifiers have been used in these papers. By checking 
these researches becomes obvious many problems in the field of the leaf recognition, 
including the lack of a standard dataset for assessing methods, the need to discover 
new features to increase the accuracy of the recognition, the optimization of the 
classifiers, reduction the time complexity of methods and etc. Each of the top issues 
and problems can be considered as guidelines for future work. 
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Biharmonic Functions 
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Abstract We develop a hypercomplex method of solving of boundary value prob- 
lems for biharmonic functions. This method is based on a relation between bihar- 
monic functions and monogenic functions taking values in a commutative alge- 
bra associated with the biharmonic equation. We consider Schwarz-type boundary 
value problems for monogenic functions that have relations to the plane elasticity. 
Using expressions of solutions by hypercomplex integrals analogous to the classical 
Schwarz and Cauchy integrals, we reduce considered problems to systems of inte- 
gral equations and establish sufficient conditions under which these systems have the 
Fredholm property. It is made for the case where the boundary of domain belongs to 
a class being wider than the class of Lyapunov curves that was usually required in 
the plane elasticity theory. 


Keywords Biharmonic equation - Biharmonic boundary value problem - 
Biharmonic algebra - Biharmonic plane - Monogenic function - Schwarz-type 
boundary value problem - Biharmonic Cauchy type integral - Fredholm integral 
equations 


1 Main Boundary Value Problems for Biharmonic 
functions 


Let D be a bounded domain in the Cartesian plane x Oy, and let its boundary 0D be 
a closed smooth Jordan curve. Let R be the set of real numbers. 
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Consider main boundary value problems for biharmonic functions W: D —> R 
which have continuous partial derivatives up to the fourth order inclusively and satisfy 
the biharmonic equation in the domain D: 


O*W(x, y) OtW(x,y) OF Wx, y) 
2 = , . ae: 
WY = Ox* Ox? Oy? Oy* ~ 


0. (1) 


The principal biharmonic problem (ct., e.g., [1, p. 194] and [2, p. 13]) consists of 
finding a function W: D —> R whichis continuous together with partial derivatives 
of the first order in the closure D of the domain D and is biharmonic in D, when its 
values and values of its outward normal derivative are given on the boundary 0D: 


W (xo, Yo) = wi(s), wo, yo) =42(s) V(x, yo) € OD, (2) 
where s is an arc coordinate of the point (xo, yo) € OD. 

In the case where w is a continuously differentiable function, the principal bihar- 
monic problem is equivalent to the following biharmonic problem (cf., e.g., [1, p. 
194] and [2, p. 13]) on finding a biharmonic function V : D —> R with the following 
boundary conditions: 


OV(x, y) 
im —.—— = 43(s), 
(x,y) (o.yo), "ED ur ' 
x, 
=" = w4(s)  ¥(X0, 90) € OD, 3) 


(x,y) (0. yo), ED Oy 
[ex cos Z(s, x) + wa(s) cos Z(s, y)) ds =0. 


oD 


Here given boundary functions w3, w4 have relations with given functions w 1, w2 of 
the problem (2), viz., 


w3(s) = w(s) cos Z(s, x) + w2(s) cos Z(n, x), 
w4(s) = w(s) cos Z(s, y) + wo(s) cos Z(n, y), 


where s and n denote unit vectors of the tangent and the outward normal to the 
boundary OD, respectively, and Z(-,-) denotes an angle between an appropriate 
vector (S or n) and the positive direction of coordinate axis (x or y) indicated in the 
parenthesis. Furthermore, solutions of the problems (2) and (3) are related by the 
equality V(x, y) = W(x, y) +c, wherec eR. 

A technique of using analytic functions of the complex variable for solving the 
biharmonic problem is based on an expression of biharmonic functions by the Gour- 
sat formula. This expression allows to reduce the biharmonic problem to a certain 
boundary value problem for a pair of analytic functions. Further, expressing ana- 
lytic functions via the Cauchy type integrals, one can obtain a system of integro- 
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differential equations in the general case. In the case where the boundary OD is a 
Lyapunov curve, the mentioned system can be reduced to a system of Fredholm 
equations. Such a scheme is developed (cf., e.g., [3-6]) for solving the main prob- 
lems of the plane elasticity theory with using a special biharmonic function which is 
called the Airy stress function. 

Another methods for reducing boundary value problems of the plane elasticity 
theory to integral equations are developed in [2, 7-12]. 

In particular, in the paper [12], for solving the biharmonic problem we devel- 
oped a method which is based on the relation between biharmonic functions and 
monogenic functions taking values in a commutative algebra. Using an expression 
of monogenic function by a hypercomplex analog of the Cauchy type integral and 
considering a Schwarz type boundary value problem for monogenic functions, we 
reduced the biharmonic problem to a system of integral equations and established 
sufficient conditions under which this system has the Fredholm property. It was made 
for the case where the boundary of domain belongs to a class being wider than the 
class of Lyapunov curves that was usually required in the plane elasticity theory (cf., 
e.g., [2-4, 9, 11)). 

In this paper, improving the mentioed method of the paper [12], we weaken con- 
ditions on the boundary of domain and the given boundary functions in comparison 
with results of the paper [12]. 


2 Monogenic Functions in a Biharmonic Algebra 


We say that an associative commutative two-dimensional algebra B with unit 1 over 
the field of complex numbers C is biharmonic (this notion is proposed by Kovalev 
and Mel’nichenko in [13]) if in B there exists a biharmonic basis, 1.e., a basis {e), e2} 
satisfying the conditions 


(7+e)=0, e+e20. (4) 


Kovalev and Mel’nichenko [13] found a multiplication table for a biharmonic 
basis {e1, eo}: 
e=1, e =e, 4 ier, (5) 


where i is the imaginary complex unit. 

Study [14] proved that there exist only two type of two-dimensional algebra with 
1 over the field C (up to isomorphism). Mel’nichenko [17] proved that there exists 
the unique biharmonic algebra B with a non-biharmonic basis {1, p} for which 


p = 2e; + 2ie2 (6) 
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and p* = 0. Itmeans that only one of algebras considered by Study [14] is biharmonic. 
Note that the algebra B is isomorphic to four-dimensional over the field of real 
numbers R algebras considered by Douglis [15] and Sobrero [16] also. 

We use the euclidian norm |la|| := /|z1|? + |z2|? in the algebra B, where a = 
ze, + Z2€) and z},z2 EC. 

Consider a biharmonic plane [e,,c. = {¢ = xe1 + yer: x,y € R} which is a 
linear span over the field R of the elements of biharmonic basis {e, e2} which satis- 
fies (5). 

With a domain D of the Cartesian plane x Oy we associate the congruent domain 
De := {G = xe; + yeo : (x, y) € D} inthe biharmonic plane jie, 2, and the congruent 
domain D, := {z = x + iy: (x, y) € D} in the complex plane C. Its boundaries are 
denoted by OD, OD¢ and OD,, respectively. Let De (or D;, D) be the closure of 
domain D¢ (or D,, D, respectively). 

In what follows, ¢ = x e; + ye2,z = x +iy, where (x, y) € D, and Cy = xpe; + 
yoe2, 20 = Xo + Lyo, where (xo, yo) € OD. 

Any function ®: De —> B has an expansion 


®(C) = U(x, yey + U2(x, y) iey + U3(x, y)eo+ U4(x, y) ieo, (7) 


where U;: D —> R, / = 1,4, are real-valued component-functions. We shall use 
the following notation: U; [®] := U;,/ = 1, 4. 

We say that a function ®: De —> B is monogenic in a domain Dg if it has the 
classical derivative 


POs lim (+h) —O@))h” 


hE fey ,e9 


at every point ¢ € De. 

It is proved in [13] that a function ®: D; —> B is monogenic in D, if and only 
if its each real-valued component-function in (7) is real differentiable in D and the 
following analog of the Cauchy—Riemann condition is satisfied: 


O(C) _— A®() 
Oy Ox 


e. (8) 


Rewriting the condition (8) in the extended form, we obtain the system of four 
equations (cf. [13]) with respect to the component-functions U;, k = 1, 4, in (7): 


OUj (x, y) _ OU3(x, y) 


Oy Ox 

OU2(x, y) _ OU4(x, y) 
Oy Ox 

OU3(x, y) _ OUi(x, y) 7 Us, Y) 
Oy Ox Ox 


OUa(x, y) _ OU2(x, y) poly) 
Oy oe Ox 


(9) 
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It is established in [18, 19] that every monogenic function ®: De —> B has deriva- 
tives ®”(C) of all orders n in the domain Dg and, therefore, it satisfies the two- 
dimensional biharmonic equation (1) in the domain D due to the relations (4) and 
the equality 

N?O(C) = © (C) (e} +65). 


Therefore, we named in [20] sucha function ® biharmonic monogenic function in De. 

Every component U;: D —> R, 1 = 1, 4, of the expansion (7) of a biharmonic 
monogenic function ®: D; —> B satisfies the Eq. (1) also, i.e., U; is a biharmonic 
function in the domain D. 

At the same time, every biharmonic in D function U(x, y) is the first compo- 
nent U; = U in the expression (7) of a certain biharmonic monogenic function 
® : De —> Band, moreover, all such functions ® are found in [18, 19] in an explicit 
form. 

Every monogenic function &: D; —~ B is expressed via two corresponding 
analytic functions F: D, —> C, Fy: D, —~> C of the complex variable z in the 
form (cf., e.g., [18, 19]): 


O(C) = F(z)e, — (3 F'(z) — Foo) p VCE De. (10) 


The equality (10) establishes one-to-one correspondence between monogenic func- 
tions ® in the domain D,; and pairs of complex-valued analytic functions F’, Fo in 
the domain D,. 

An example of monogenic function in both domains Deg and [Ue,,c, \ De is the 
biharmonic Cauchy type integral 


1 
Bly}(¢) = ai / y(r)(r — ¢)"'dr (11) 


aDe 


with a continuous density p: 0D; —> B. 


3  Schwarz-Type Boundary Value Problems for Monogenic 
Functions 


Consider a boundary value problem on finding a function ®: D; —> B which is 
monogenic in a domain D; when limiting values of two component-functions in (7) 
are given on the boundary 0D¢, i.e., the following boundary conditions are satisfied: 


Ux (x0, Yo) = Ue(Go), Um(X0, Yo) =Um(Go) = V Go € ODe 
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for 1 <k <m <4, where 


U;(x0, Yo) = ; “I [b(q)], Je {k, m}, 


lim 
> O,.CED, 
and ux, Uy,» are given continuous functions. 

We shall call such a problem by the (k-m)-problem. 

Kovalev [21] considered (k-m)-problems with additional assumptions that the 
sought-for function ® : De —> B is continuous in De. He named such problems 
as biharmonic Schwarz problems owing to their analogy with the classic Schwarz 
problem on finding an analytic function of a complex variable when values of its real 
part are given on the boundary of domain. In [22], we called problems of such a type 
as (k-m)-problems in the sense of Kovalev. 

(k-m)-problems in the sense of Kovalev are investigated in the papers [12, 20, 21, 
23-27] 

Kovalev [21] established that all (k-m)-problems are reduced to the main three 
problems: with k = | andm é€ {2, 3, 4}, respectively. He shown that the biharmonic 
problem is reduced to the (1-3)-problem. Indeed, let ©; be monogenic in D¢ function 
having the sought-for function V (x, y) of the problem (3) as the first component: 


®1(¢) = Vix, yey + V2 (x, y) ie} aE V3 (x, y)eo+ Va(x, y) ieo. 


It follows from the condition (9) for ® = ®, that OV3(x, y)/Ox = OV (x, y)/Oy. 
Therefore, 


V. 
OV (x, y) sett OV2(x, y) ‘Bh OV (x, y) ae OVa(x, y) 7 


1G) = Ox Ox Oy Ox 


(12) 
and, as consequence, we conclude that the biharmonic problem with boundary 
conditions (3) is reduced to the (1-3)-problem on finding a monogenic in D¢ 
function ® = {| when values of two components U; = OV(x, y)/Ox and U3 = 
OV (x, y)/Oy of the expression (7) are given on the boundary 0D¢. 

A relation between (1-4)-problem and boundary value problems of the plane elas- 
ticity theory is established in [27], where a problem on finding an elastic equilibrium 
for isotropic body occupying D with given limiting values of partial derivatives 
Ou/Ox , Ov/Oy for displacements u = u(x, y), v = v(x, y) on the boundary OD is 
considered. It is shown in [27] that such a problem is reduced to (1-4)-problem (see 
also [22, 26]). 


4 Solutions Represented by Means Hypercomplex 
Cauchy-Type Integrals 


We shall find solutions of (1-3)-problem and (1-4)-problem in the classes of functions 
represented in the form 
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O(¢-) = Biyl(Q) VCe De, (13) 
where 
pC) = vider + p3(Qe2 VC € ODe (14) 
for the (1-3)-problem or 
p(Q) = vider + pa(Qhier VO € OD¢ (15) 


for the (1-4)-problem, respectively, and every function y;,: OD¢ —> R,k € {1, 3, 4}, 
is continuous. 

In the papers [12, 26] we found solutions of the mentioned problems with addi- 
tional assumption that the function y: 0D; —> B satisfies the Dini condition 


1 


i DE i hes (16) 
T 
0 t 
where 
wp, €) i= sup lem) — 9(72)|I 


7572€0De¢, ||T1-T2I|SE 


is the modulus of continuity of the function y. 

Below, we shall show that the condition (16) can be withdrawn and, moreover, 
it makes it possible to weaken the assumptions with respect to the boundary OD¢ 
and the given boundary functions u,, u3 for the (1-3)-problem or u 1, u4 for the (1- 
4)-problem in comparison with results of the papers [12, 26]. 


5 Auxiliary Statements 


The existence of the limiting values 


Br[gl@) =. lim ©), B[pl):= lim _ (¢) 
C00, CE De 6 Co, CE He} ey \De 


of the biharmonic Cauchy type integral (11) in any point Gy € OD¢ is naturally related 


to the existence of the singular integral which is understood in the sense of its Cauchy 
principal value: 


i o(t)(r — Go) dt = lim : y(t )(T — Go) ‘dr, 
OD; AD\ADe (Go) 


where y € ID¢, ODE((o) := {7 € ODe : || — Goll < eh. 
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The following lemma presents necessary and sufficient conditions for continuous 
extensibility of biharmonic Cauchy type integral (11) to the boundary of domain. 
It is completely analogous to the corresponding result which is obtained by Salaev 
[28] for complex Cauchy type integral and is proved more simply by Tokov [29]. 


Lemma 1 Integral (11) is continuously extended to the boundary from either the 
domain D¢ or the domain fle, e, \ Dc if and only if the integral 


(27) — Go) (7 — Go) "dr (17) 


D.\ ADE (Go) 


converges uniformly with respect to Go € OD¢ ase > 0. 

If integral (17) converges uniformly with respect to Gy € OD¢ as € — O, then inte- 
gral (11) has limiting values B*[p](Go) in any point Gy € OD¢ that are represented 
by the Sokhotski-Plemelj formulas: 


1 
B*[yl(Go) = e(Go) + aa / (y(7) — 9(G0))(t — Go)! dr, 


ID: 


1 
Bel) = == / (y(r) — p(Q)) (7 — G@) "dr. 


aD: 


(18) 


Corollary 1 /f integral (11) is continuously extended to the boundary from the 
domain De, then it is also continuously extended to the boundary from the domain 
He,,e, \ De, and the formulas (18) are true for any point Cy € OD¢. 


For a set E CC and a point to € C, we use the denotation E*(to) := {t € 
E: |t — to| < ¢}. We use also the modulus of continuity of a functionh: E —> C 
continuous on the set E: 


we(h,€) ‘= sup |A(t1) — A(h)| . 
t.0€E, |t;—h|<e 
Let UO := {Z €C: |Z| < 1} be the unit disk and T := {S € C: |S| = 1} be the 
unit circle in the complex plane C. 
We shall use the following statement on continuous extensibility to the boundary 
of complex Cauchy type integral 


1 QS, Z 
(S,Z) | 


= | = 9S, ZE0, (19) 
Qi S-—Z 
r 


in the case where its density Q2(S, Z) depends on the variable Z also. 
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Lemma 2 Assume that a function Q: T x © — C satisfies the estimates 

|Q(S, Z)| <cw(|S— Z|) VSETVZEU, (20) 
w(|S — Zol) 


|S — Zo| 
VZeU VS Bere S=-Als22-F), CH 


|2(S, Z) — Q(S, Zo)| < ¢ |Z — Zo 


where the constant c does not depend on S, Z, Zo and the majorant w(-) satisfies 
the Dini condition of type (16). Then the function (19) is continuously extended to 
the boundary Y and 


7 1 Q(S, Z) 1 Q(S, Zo) 

im ——- _———— pee ee 

Z>Z.Z&0 227i S-—Z 2ni S— Zo 
ie F 


dS WZoeV. (22) 


Proof The integral in the right-hand side of equality (22) exists due to estimate (20) 
and the condition of type (16) for the majorant w(-). 

Let Zp be any fixed point of T and Z € OU be such that € := |Z — Zo| < 1. 

Let us consider the equality 


[Ses [Ses- i 208, Z) 16 i SAS 20) cs 
r it 


S—Z S—Zo S-—Z S—Zo 
Pr (Zo) T*(Zo) 


Q(S, Z) — Q(S, Zo) / Q(S, Zo) 
Fe eee s pe ee es 
0: / SZ wey) (§=Z (82) 
P\P2- (Zo) P\T* (Zo) 


=J-h+3+ Jg 


and estimate the modulus of integrals J;, 1 = 1,4. 
Taking into account the inequalities |S — Z| < |Z — Zo| + |S — Zo| < 3e for all 
S € 1*(Zo) and estimate (20), we obtain the relations 


|2(S, Z)| w(|S — Z|) 
Wi] < aaa eal Se See eS = 
|S —Z| js —F| 
T'=(Zo) T=(Zo) 


3e 


ce f Pay >0, -e->0, 
Uf] 


due to the condition of type (16) for the majorant w(-). 
The integral Jy is similarly estimated. 
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Taking into account the inequalities |S — Zo| < |S — Z| +|Z— Z| < 3|S— 
Z|/2 forall SeT \ T'*(Zo) and estimate (21), we obtain the relations 


|2(S, Z) — Q(S, Zo)| 


Ae 
|| < is Z| 


|dS| < 


T\T*(Zo) 


2 
w(|S — Zol) w(7) 
<c|Z—Zo| ——_—— |dS|<ce ] ——dn-0, «-0, 
|S — Zo|? "7 
P\T* (Zo) 2e 


due to the condition of type (16) for the majorant w(-). 
In such a way, using estimate (20) instead of estimate (21), we obtain the relations 


2 
|I4| <ce a) 5-50, e>0. 
" 


2e 


Thus, equality (19) follows from the delivered estimations of integrals 


J,,1= 1,4. 


Consider the conformal mapping o(Z) of the unit disk UO onto the domain D,. 
Denote o;(Z) := Rea(Z), o2(Z) :=Imo(Z). 

Inasmuch as the conformal mapping o is continued to a homeomorphism between 
the closures of corresponding domains, the function 


G(Z) = 01 (Ze, + oi(Z)er VZ ED, (23) 


generates a homeomorphic mapping of U onto the set De. 
Let tT = a(S), where S € , and ¢ = G(Z), where Z € U. Then 


1 o2(S) — 02(Z) ip 
G(S)—G(Z) (&(S) —G(Z))? 2° 


(T-Ot= (24) 


Assume that the conformal mapping o(Z) has the continuous contour derivative 
o'(S) on the unit circle !. Then the mapping (23) has also the continuous contour 
derivative o’(S) that is expressed by the equality 


&'(S) = 0! (S) — 04(S) f vSeP. (25) 


We use equalities (24) and (25) to transform integral (11): 
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1 Po) 
Rl oa | G(S)-a(Z) 
r 


1 i p jes 
= 55 | o9)Kviz.yas+F | (G(s) Kaz, ast 
r 


2n1 
r 


1 a S+Z 
+75 | eo as VZeU, (26) 
T 


where 
a’ (S) S+Z 


ar TET ass = Z) 


(o2(S) — 02(Z))o'(S) 03(S) 
2(o(S)—o(Z)) (a(S) — a(Z)) 


K,(Z, S) = 


Now, assume that o’/(S) 4 O forall S € I’, and the modulus of continuity wr (o’, €) 
satisfies the Dini condition of type (16). 

Then there exist constants c; and c2 such that the following inequality is valid for 
all $,7 €T: 
a(S) — o(T) 
—____— “| <e¢ 


0 < 
Se | eee: =| 


(27) 


In this case, for all S$, 7, Z) € F' such that 0 < |S — T| < |S — Zo| the following 
estimate is also valid: 


a(S)—a(T) _ a(S) — a(Zo) 
S—T S— Zo 


wr(o’, |S = Zol) 
=¢ 
|S — Zo| 


|T — Zol, (28) 


where the constant c does not depend on S, T, Zo. They are adduced in [30], cf. 
also [12]. 

Passing to the limit in inequality (28) as T tends to S, we obtain the following 
inequality (here, for the usability, we substitute T for Zo): 


o(S) — a(T) 


ag 


<cwr(o’, |S — T]) (29) 


for all S, T € T, S # T, where the constant c does not depend on S and T . 

Let us show that inequalities (27)-(29) will be valid if to replace T € I onto 
Z € U therein. 

Denote d(S, Z) := ——— . For each fixed S € IT the function d(S, Z) is ana- 
lytic with respect to the variable Z in the disk O . Moreover, this function is contin- 
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uously extended in the points Z ¢ l', Z # S. Finally, the existence of the contour 
derivative o’(.S) implies the existence of the solid derivative 


: o(Z) — a(S) 
o(S):= lim ———— 
5) ZS, 260 Z-S 


It holds due to Tamrazov’s Theorem 2.7.6 in [31]. Thus, the function d(S, Z) is 
analytic with respect to the variable Z in the disk O and is continuously extended to 
the boundary I". Obviously, the function 1/d(S, Z) is just like this. By the maximum 
principle for analytic functions, inequality (27) implies the following inequality: 


o(S) — a(Z) 


0 < 
hike S—-Z 


<O (30) 


foralSeT,ZeU,Z # S, where the constants c; and cz do not depend on S and Z. 
For a fixed S ¢ I, by ds we denote the continuous extension of the function 
d(S, Z), which is given for all Z € U, onto the set 0. 
By Tamrazov’s Theorem 2.7.2 in [31], for any fixed S, Zo € I’ the solid modulus 
of continuity WG Zy (ds, €) and the contour modulus of continuity wr-(z,)(ds, €) 
are related by the inequality 


W REZ) (ds, €) S Cwre(Z) (ds, €), 


where the constant c does not depend on S, Zo and <. Therefore, for any fixed 
different points S, Zp € I and any Z € U such that |Z — Zo| = ¢, we have 


|ds(Z) — ds(Zo)| < w GG ds. €) < Cwre(z) (ds, €) = 


=c sup |ds(T1) — ds(Zo) + ds(Zo) — ds(T2)| < 
T,,T,€T* (Zo), |T1—-Th| Se 


<2c sup |ds(T) - ds(Zo)| : 
TeP<(Zo) 


Hence, using estimate (28) and properties of a modulus of continuity (cf., e.g., [32, 
p. 176]), we obtain the estimate 


a(S)—a(Z) a(S) — o(Zo) ee wr(o’, |S — Zol) 
f=Z a ae is Fal 


IZ—Zo| GBI) 


for all different S,Z) € [ and Z € U, where the constant c does not depend on 
S is Zo and Z. 

In such a way as for proving estimate (31), for any fixed point S € I’ and any 
Z € U such that |Z — S| = €, we obtain the following relations: 
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|o’(S) — d(S, Z)| = |ds(S) — ds(Z)| < wary ds, €) < cwres)(ds, 6) = 


=< sup |ds(T,) — o'(S) + o'(S) — ds(To)| < 
T,,T2,€1*(S), |T—Th|<e 


<2c sup |o’(S)—d(S,T)|. 
Tel<(S) 


Hence, using estimate (29), we obtain the inequality 


a(S) — o(Z) 


i eos 


<cur(o’,|S— Zl), (32) 


which is fulfilled for all Se FT and Z € U, Z 4 S, where the constant c does not 
depend on S and Z. 
Now, we can prove the main lemma of this section. 


Lemma 3 Let the conformal mapping o(T) have the nonvanishing continuous con- 
tour derivative o'(T) on the circle Y, and its modulus of continuity satisfy the Dini 
condition of type (16). Let a function py: OD¢ —> R be continuous. Then the real- 
valued component-functions U,[®], U3[®] and Us[®] of the function (13) are con- 
tinuously extended to the boundary OD¢. 


Proof Taking into account the denotation d2(S, Z) := - and equality (6), 
we rewrite the equality (26) in the form 


e: f 2 (S,Z) e: [ (a(S)) 
D — a ———____ pear d 
) Qi if S-Z as Ani S oar 
T 
2ie; — 2e2 Q2(S, Z) el 7 S+Z 
rh oe S dS = 
oa / soz +a, f tO) S(S —Z) 
i? 1 


aI+h+h+h VZe0, 


where 


= 1 _ '(S 
Q1(S, Z) = y(F(S)) (Kuz, S) =) (S — Z) = 9(&(S)) (4 y _ 1). 


Q5(S, Z) = y(F(S))K2(Z, S)\(S — Z) = 


y(G(S)) [ 0'(S)d2(S, T) a5(S) 
2 (a(s,T)) 40S, T) J 
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Taking into account estimates (30) and (32), for all Se T., Z € U, we have 
1 
|21(S, Z)| < — |o/(S) — a(S, Z)| < cwr(o’, |S — ZI), 
Cl 


where the constant c does not depend on § and Z, i.e., the function Q, satisfies the 
estimate of type (20). 

Taking into account estimates (30) and (31), for all S, Z) € ', Z € O such that 
|S — Zo| => 2|Z — Zo|, we have 


|o’(S)| 


Q1(S, Z) — Q10S, Zo)} < 


| |d(S, Z) — d(S, Zo)| < 


wr(o’, |S = Zol) 
SC 
|S — Zo| 


|Z — Zo|, 


where the constant c does not depend on S, Zp and Z, i.e., the function Q, satisfies 
the estimate of type (21). 

It is similarly established that the function (2 satisfies the estimates of type (20) 
and (21). 

Therefore, by Lemma 2, integrals J; and J; are continuously extended to the 
boundary I’ from the domain 0 . Furthermore, J, = const and U;[l4] > y (F(Zo)) /2 
as Z — Zo € T due to the corresponding property of the classic Schwarz integral 
for a disk. Thus, only U2[J4] can not be continuously extended to the boundary I 
from the domain 0 . 


Consider the conformal mapping 
w-i 
rw) =o (=) Vwe{weC:Imw>O0}:= My 
wt+l 


that is continued to a homeomorphism between the upper half-plane {w € C: 
Im w => 0} U {oo} and the closure D, of domain D, . Denote 


T(s) := Ret(s), tm(s):=Imr(s) VseR 


and 
T(s):=T(s)ey tro(s)eo VWs eR. 


Now, we can rewrite the equality (26) in the form 
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[o.e) 
1 
Biel) = 5 / g(s)k(w, s)ds + 
TT 
—0o 


i / ee ee a (33) 
— Ss.) W 
Qri e (s — w)(s2 + 1) ‘ 7 


where g(s) := y(7(s)) , 
k(w, s) :=ki(w, sje) +ipko(w,s), (34) 


T'(s) l+sw 


GG) C= wee 


(35) 


7'(s)(n2(s) — rw) r4(s) 
A(r(s)—7(w)) (76) — rw) 


ky(w, §) = (36) 


and the integrals are understood in the sense of its Cauchy principal value (cf., e.g., 
[33]), ie., 


oo N 
/ g(s,-)ds:= lim / a(s,-)ds, 
N->+00 
oes —N 


Inasmuch as the sum J; + 1, + J; from the proof of Lemma 3 is continuously 
extended to the boundary I’ from the domain O, the first integral in the right-hand 
side of equality (33) is continuously extended to the extended real axis R := R U {oo} 
from the domain IT, . 

Let C (IR) denote the Banach space of continuous functions g: R —> C with the 
norm ||gllc@ ‘= sup |g(t)|. 

teR 


In completion of the section, let us formulate an auxiliary statement proved as 
Theorem 6.13 in [12]. 


Lemma 4 Let the conformal mapping o(T ) have the nonvanishing continuous con- 
tour derivative o'(T) on the circle Y, and its modulus of continuity satisfy the Dini 
condition of type (16). Let the function k(t, s) be defined by the relation (34), where 
the functions k(t, 5), ka(t, s) are defined by the equalities (35), (36), respectively. 


Then the operator 
[o.e) 


I[g]:= i g(s)k(t,s)ds VteR 


—oCo 


is compact in the space C(R). 
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6 Reducing the (1-3)-Problem to a System of Integral 
Equations 


Seeking a solution of the (1-3)-problem in the form (13), where ¢ is defined by (14), 
and using the formula (33) and Lemma 3, we can state that the real-valued component- 
functions U;[®] and U3[®] of the function (13) are continuously extended to the 
boundary OD;, and their limiting values on the boundary are expressed by the 
formulas 


CO 
1 1 
Ui (Xo, Yo) = 5 gi(t)+ a i gi(s)(Im ka (1, s) +2Rek(t, s)) ds— 
—0o 
1 CO 
=e / g3(s)Imko(t, s) ds, 
TT 
—0o 
1 lf 
U3(x0, Yo) = 5 g3(t) + = / ga(s)(Im ki, s) —2Rek(t, s)) ds— 
—oo 
1 CO 
== i gi(s\Imko(t,s)ds VteR, 
TT 
—0o 


where Cp = xoe1 + woe2 = T(t), gi(s) = Y1 (F(s)) ,l=1,3. 
As result, we deliver the following system of integral equations for finding the 
functions g and g3: 


1 _ 
580) + 5— f gi()(Imby(t,5) + 2Rekatt.s)) ds— 
1 r a 
-= f ss@yimkgtt.s)ds = (0), 
ins 60 (37) 
1 1 
582) + == f ga(s)(Im&y(t,s) ~ 2Reka(t,s)) ds— 
== f govmin,s)ds =H) VreR, 


where i)(t) := u)(7(t)), 1= 1,3. 
The system (37) is a system of Fredholm integral equations due to Lemma 4. 
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Rewrite the integral equations of the system (37) in expanded form: 


T{(s)(72(8) — 2M) 
(r(s) — r(t))? 


1 foe} 
ds + —Re j gi(s) 
Qn 


—0o 


1 ' Is 7, T'(s) 
7 Ot x mf si) 


1 
— — Re 


a [ oa) Ty(s)(rals) — 2) 


1 fo} 
Ga-ror “ i | 830) Ga) — TO? 


1. Pf hesyni(s) 10), 
tyglm fo Ta ye te =H. 


1 1 i T'(s) 1 , , T1(s)(72(8) — 72(0)) 
3 g3(t) + = Im / 80) a) ds — an Re / g3(s) (rs) — 10) d. 


—0o 


i ZOE) = 10) OO EO 


1 1 ra 
“See CO-1 a | a9) 6) — 70) 


20 


Lf hsyi)— 10), 
+5 Im | gi(s) Geo ds=u3(t) VteR. 


Then the homogeneous system transposed with the system (37) is of the form: 


1 (,. f mods \ nO, fF, mo)- nl) 
aaa / o=70 oe | Sarre 
na. ff. ms—-n® , 7M. fs) —n(0) 
ciara Re f "OCH —roye * im f OCG) —rHy 
na. fms) —nO) 


—0o 


(oe) 


h3(s) ds 10) pe / ha(o) LOA PO) 
T(s) — (0) eM (rs) — TOY? 


h3(t) — “im vw f 
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(oe) 


H(t) (n)—n®), AO f. Ms) —-n) 
a Re f NO Gee im | OO C=O 


(oe) 


A) (ns)—n) 


Lemma 5 ([/2]). The pair of functions (hy, hz) := (11, 75) satisfies the system of 
integral equations (38). 


It follows from Lemma 5 that the condition 
oe) 
/ (i (s)ri(s) + ifs(s)r3(8)) ds =0 
—0o 


is necessary for the solvability of the system of integral equations (37). Passing in this 
equality to the integration along the boundary 0D¢, we get the equivalent condition 


/ uy (xe; + yer) dx + u3(xe; + yer) dy =0 (39) 
aDe 


for given functions 1, v3. 

It is proved in [23] that the condition (39) is also sufficient for the solvability of 
the (1-3)-problem for a disk. The next theorem contains assumptions, under which 
the condition (39) is necessary and sufficient for the solvability of the system of 
integral equations (37) in the Cartesian product C (R) x C(R) and, therefore, for 
the existence of a solution of the (1-3)-problem in the form (13). 


Theorem 1 Let the given functions u;: OD; —> R, u3: OD¢ —> R be continu- 
ous. Let the conformal mapping o(T) have the nonvanishing continuous contour 
derivative o'(T) on the circle 1, and its modulus of continuity satisfy the Dini con- 
dition of type (16). 

Assume additionally: 


1) all solutions (g1, g3) € C(R) x C(R) of the homogeneous system of equations 
(37) (with u, = %3 = 0) are differentiable on R; 

2) for every mentioned solution (g1, g3) of the homogeneous system of equations 
(37), the contour derivative yp’ of the function 


p(T) = gi(s)e1 + g3(s)en, T=T(s), VsER 


is integrable on OD¢, and the functions 
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Ui [B[¢'] ©] - Us [Ble] @O] vee De, 


U2 [Ble] (O] + Us [B[Y]O] Vo € Mee \ De 


are bounded. 
Then the following assertions are true: 
(i) the number of linearly independent solutions of the homogeneous system of 
Eq. (37) is equal to 1; 


(ii) the non-homogeneous system of Eq. (37) is solvable if and only if the condition 
(39) is satisfied. 


Proof It follows from Lemma 5 that the transposed system of Eq. (38) has at least one 
nontrivial solution. Therefore, by the Fredholm theory, the system of homogeneous 
equations (37) (with “7; = 73 = 0) has at least one nontrivial solution g; = gf, 3= 
ge. Consider the function 


yo(t) = gi(sjert+ gi(s)ex, T=7T(s), VsER (40) 


corresponding to this solution. 
Then the function, which is defined for all ¢ € D¢ by the formula 


Do(¢) := Bl yo] (G), (41) 


is a solution of the homogeneous (1-3)-problem (with u; = u3 = 0). By virtue of 
differentiability of the functions 2’, gf on R, the function yp has the contour derivative 
yo(T) for all 7 € OD; and the following equality holds: 


O)(C) := Bl] Ve De. (42) 


Let ©; be a monogenic in D¢ function such that ©) (¢) = ®o(¢) for all ¢ € De. 
Then the function V(x, y) := U,[®,(Q)] is a solution of the homogeneous bihar- 
monic problem (3) (with w3 = w4 = 0). Moreover, we have 


OV(x,y) _ 
o% 


OV (x, y) 


U1 [Blo] 1 By 


=U3(Blyol(@] VCE De. (43) 
Now, taking into account the equalities (43) and (8), we obtain a chain of equalities 
at each of the points (x, y) € D: 


OV (x, y) a: OV (x, y) = 


AV(x, y) — ax dy? 


° (Us BIyol 1) = 


= 2 y, [Bol ()1) + dy 


Ox 
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) 
= ui 5 (Blyo] | + us| 5. 


dy (B yo] «|= 


= vif tel @)] +05 Ee te | = Ui [89 (Q)] + Us [e2O6(Q)]- 


By virtue of the equalities (42) and U3[e.a] = U;[a] — 2U,[a] for all a € B, the 
obtained expression for AV (x, y) takes the form 


AV(x, y) = 2(Ui[B [45] ©] - UsLB[eo] O]) VE € De, 


from which we conclude that the Laplacian AV (x, y) is bounded in the domain D. 

By the uniqueness theorem (see Sect. 4 in [34]), solutions of the homogeneous 
biharmonic problem (3) in the class of functions V, for which AV (x, y) is bounded 
in D, are only constants: V = const. Therefore, taking into account the fact that the 
functions (41) and (12) are equal, we obtain the equalities U;[®o(¢)] = U3[®o(C)] = 
0 in the domain De. 

All monogenic functions ®o satisfying the condition U;[®o(¢)] = 0 are described 
in Lemma 3 in [18]. Taking into account the identity U3[®o(¢)] = 0 also, we obtain 
the equality 

Bo(C) =ikC +i (mer tnre) WOE De, (44) 


where k, n; and n> are real numbers. 

Let us show that the constant k does not equal to zero in the equality (44). Assume 
the contrary: k = 0. Then, by Corollary 1, the function (41), which is considered for 
all ¢ € Le, e, \ De, is a solution of (2-4)-problem with the following boundary data: 


Ul) (Q]=m, Usl®oQ]=n2 WCE dD. (45) 


Using the boundedness of the function U2 [B[¥] (¢)| + Us [B [5] (] in the 
domain {Ue, e, \ De and arguments analogous to those used in the proof of the equality 
(44), we establish that a function ®g, which is monogenic in fle, , \ D¢ and satisfies 
the boundary conditions (45), is of the form 


o(C) = ki + mye, +mre2 Fine; +inre. WCE Mee, \ De, 
where k,, m, and m > are real numbers. Inasmuch as the function 


(6) = Blyol (Q) V6 € Mere, \ De 


vanishes at the infinity, we get kj =m, =m2 =n; =n2 = 0. 
Therefore, as a corollary of the Sokhotski—Plemelj formulas (18) for the function 
(41), we get the equalities 
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yo(t) = OF (7) — Bp (7) =0 Vre dD. (46) 


As a result of (46), the identity yo(7) = 0 contradicts to the non-triviality of at 
least one of the functions rae go in the definition (40) of the function yo. 

Hence, the assumption k = 0 is not true. Therefore, the equality (44) holds with 
k#0. 

Let g1 = g},.83 = 93 beanontrivial solution of the homogeneous system (37), and 
g}, g be different from g?, g9. Then for the function y!(r) := gi(s)e1 + gh(s)eo, 
where 7 = 7(s) for all s € R, the following equality of type (44) holds: 


Bly']@ =ikC+iGiertine) VCE De, (47) 


where k, 7; and 7 are real numbers. 
Define a function © by the equality 


k 
G(r) = p(T) - ; yor) WredDe. 


Taking into account the equalities (44) and (47), we obtain 
= a = kV 2 fe k 
Bly] (C) = ie jae a8 + 1e2 na — na 7 Vo e De. 


Further, in such a way as for the assumption & = 0 in the equality (44), we obtain 
the identity (7) = 0 which implies the equalities g(r) = k g° (7) forall 7 € OD¢ 
and m € {1, 3}. The assertion (1) is proved. 

Due to the assertion (i), by the Fredholm theory, the number of linearly indepen- 
dent solutions of the transposed system of equations (38) is also equal to 1. Such a 
nontrivial solution is described in Lemma 5. Therefore, the condition (39) is not only 
necessary but also sufficient for the solvability of the system of integral equations 
(37). 


Remark In Theorem 7.11 [12], for the (1-3)-problem in the sense Kovalev, we 
proved assertions similar to those in Theorem | but under additional assumptions that 
the functions u;,/ € {1, 3}, satisfy the Dini condition of type (16) and the following 
condition on the conformal mapping o(T) is satisfied 


2 
[AP no an <0 (48) 
7 7 


that is stronger than the condition of type (16) used in Theorem 1. 
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7 Reducing the (1-4)-Problem to a System of Integral 
Equations 


We shall find solutions of the (1-4)-problem in the class of functions represented in 
the form (13), where ¢ is defined by (15). 

Using the formula (33) and Lemma 3, we can state that the real-valued component- 
functions U;[®] and U4[®] of the function (13) are continuously extended to the 
boundary OD;¢. 

Now, we single out the real-valued component-functions U;[®], / € {1, 4}, and 
after the substitution of their limiting values on the boundary 0D, into the boundary 
conditions of the (1-4)-problem, we shall obtain the following system of integral 
equations for finding the functions g)(s) = vy (F(s)) and g4(s) = y4 (F(s)) for all 
seER: 


(oe) 


1 1 
581) += f gi(s)(Imay(t,s) +2Reka(t,s)) ds— 


1 CO 
= if penewe haw, 

TT 

“co 3 (49) 

1 1 
ae ; ga(s)(Imky(t, s) — 2Re ko(t,s)) ds+ 
2 Qn 

1 s 
+= [ gls)Reka(t.s)ds = iia VreR, 

Tv 


where i)(t) := uj (7(t)) , Le {1,4}. 
The system (49) is a system of Fredholm integral equations due to Lemma 4. 


Theorem 2 Assume that the functions uj: OD; —> R, | € {1, 4} are continuous. 
Also, assume that the conformal mapping o(T) has the nonvanishing continuous 
contour derivative o'(T) on the circle T,, and its modulus of continuity satisfies the 
Dini condition of type (16). Then the the system of Fredholm integral equations (49) 
has the unique solution in C(R). 


Proof To prove that the system (49) has the unique solution, consider the homoge- 
neous system of Eq. (49) with w; = v4 = 0. 
Let g) = g°, = en be a solution of such a system. Consider the function 


yo(T) := gy(s)ert gy(s)iex, T=T(s), VsER 


corresponding to this solution. 
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Then the function, which is defined for all ¢ € De by the formula (41), is a solution 
of the homogeneous (1-4)-problem with uw; = u4 = 0 that is corresponding to the 
homogeneous system of Eq. (49). Taking into account Theorem 5.2 in [22], where 
the formula of solutions of the homogeneous (1-4)-problem is established, we have 
the equality ®o(¢) = a; ie; + az e2 for all ¢ € De. Therefore, by Corollary 1, we 
obtain the following relations: 


yo(Go) = Bt [vol ©) — BX [yo] (@) = a ie, + ay ep — B [yo] (Q) VG € ODg, 
which imply the equality 
B™ [pol G) = —8} Mer + a1 ie: + ane. — gy(ie, G@ =F), VteR. (50) 
Thus, the function (41) is monogenic in jie, ,¢, \ De and satisfies the equalities 
Ui[i®p (GCo)] = —a1, Usli®y (Co)] = a2 Vo € ODe. 


Now, taking into account Theorem 5.2 in [22], one can easily conclude that B [yo] (¢) 
is constant in jie, 2, \ Dc. Moreover, B [yo] (¢) = 0 in pe,,-. \ De because the func- 
tion B [yo] (¢) is vanishing at infinity. 

At last, after a substitution of O into the left-hand side of equality (50) we con- 
clude that g? — gy = 0, 1.e. the homogeneous system of Eq. (49) has only the trivial 
solution. 

Therefore, by the Fredholm theory, the non-homogeneous system of equations 
(49) has the unique solution. 


Remark In Theorem 3.1 [26], for the (1-4)-problem in the sense Kovalev, we proved 
assertions similar to those in Theorem 2 but under additional assumptions that the 
functions u;,/ € {1,4}, satisfy the Dini condition of type (16) and the condition (48) 
on the conformal mapping o(T) is satisfied that is stronger than the condition of type 
(16) used in Theorem 2. 
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On Algorithms Related to Expressibility ®) 
of Functions of Diagonalizable Algebras siete 


Andrei Rusu and Elena Rusu 


Abstract It is well known the connection of diagonalizable algebras with proposi- 
tional provability logics. The notion of expressibility of functions and related algo- 
rithmic problems is very well studied in the case of boolean functions. We propose 
and analyse algorithms for detecting functional expressibility of functions of a 4- 
valued diagonalizable algebra and examine other problems connected to iterative 
algebras based on diagonalizable algebras. 


Keywords Diagonalizable algebra - Provability logic - Expressibility of 
functions + Problems related to expressibility of functions - Iterative post algebra - 
Function algebra - Clone - Completeness problem 
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1 Introduction 


Study of compositions of functions is important for the three branches of mathemat- 
ics: the theory of many-valued logics, automata theory, and universal algebra. In the 
articles [10, 11], E. Post, one of the founders of the mathematical theory of many- 
valued logics, considered two problems. The first problem reads: which functions 
are obtainable by finite compositions from a given collection of initial functions? 
The second reads: describe all composition-closed classes of functions on a fixed set 
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A. These problems are still actual when A comprises more than two elements and is 
endowed with algebraic operations. 

Since composition is not an algebraic operation, Mal’ cev [8] suggested to consider, 
instead of classes of functions with composition, special algebras whose underlying 
sets are composition-closed sets of functions. A. I. Malcev called these algebras 
iterative and pre-iterative algebras. 

It is well-known the connection of the diagonalizable algebras [7] to the propo- 
sitional provabilty logic [15]. In the present we investigate an infinite series of 
diagonalizable algebras related to the second problem stated by E.Post relative to 
boolean functions. We give sufficient conditions for terms representing constants to 
be expressible in the considered diagonalizable algebra via a fixed system of terms. 


2 Preliminaries 


2.1 Diagonalizable Algebras 


A diagonalizable algebra [7] 9 is a boolean algebra A = (A; &, V, D, =, O, 1) with 
an additional operation L satisfying the relations: 


(x D y) = (Gx Dy), 
x= IX, 

(Ux > x) =O, 
l=1. 


where | is the unit of 2(. The diagonalizable algebras serve as algebraic models for 
propositional provability logic GL [15]. 
Let us consider the diagonalizable algebras: 


5B, B3,..., Bazi,--. (1) 


The support of these algebras consists of ordered binary sets @ = (Q),..., Qn), 
a; € {0, 1}, @ = 0,..., 7). Boolean operations over elements of the algebra %,, are 
defined component-wise, and the element Lia = (1, a1,..., a). 

Consider also diagonalizable algebra 98.., which base is formed of all infinite 
binary sequences (a1, Q2, ...), boolean operations over infine binary sequences are 
defined component-wise, and for any element (a), a2,...) we have that LI(ay, a2, 
ie7 Ow) = (l,qy,...,@),...). 

Let us denote the elements {(0, 0), (0, 1), (1, 0), (1, 1)} of the algebra 8B. by 
0, p, 7, 1, where p and o are incomparable. The boolean operations over elements 
of 8» are defined component-wise, and 


O0=Lp=a, o0=UH1=1. 
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As usual, any term ¢ of a diagonalizable algebra © defines a function f over 
support of . In this case they say f is represented by t on D, or f is expressed 
by t. Obviously, not every function over support of D can be represented by a term 
of the algebra . In the next example we show that this happens for instance for the 
diagonalizable algebra Bp. 


Example 1 Consider the diagonalizable algebra 8» defined earlier and let f a func- 
tion over {0, p, 7, 1} such that f(0) = f(p) = f(o) = land f(1) = 0. Then f can 
not be expressed by any term of the algebra Bp. 


Proof Let us note the following property of terms of 82 (denoted by Prop): if for 
any elements aj; (i = 1,2, 7 = 1,...,n) having the property Lla,; = Lla2; also 
holds Ot (ay, ..., Qin) = Ut(az1,..., Qn). 

It can be easily verified that any of the primary terms (x&y), (x V y), (x Dy), 
(=x) and (Lx) have the above mentioned property Prop. So, any other term will 
have property Prop. 

Now let us verify f does not have property Prop. It is easy to verify that Uo = 
1, but 0 f(a) 4 Uf (1). So, conclusion is function f can not be represented by a 
term of the algebra 5. 


A function f is said to be constant if for any substitution of variables of f with 
elements of D the value of f will not change. For example, terms (p&—p), (p D 
Pp), (Q-(p D p) and (-L-(p D p)) are constants on the algebra 82, denoted as 
elements of 8» by 0, 1, o, p. 

They say [6] two diagonalizable algebras ®; and D2 are n-equal if they contain 
the same distinct terms of n variables. 


2.2 Propositional Provability Logics Associated to 
Diagonalizable Algebras 


As usual [15] the calculus of the propositional provability logic GL is based on 
formulas built of variables p;, p2,... and logical connectives &, V, D, =, LI. The 
axioms of L are axioms of the classical propositional logic: 


(PD (@DPp)), 
(pD(qDr))D(pdgq)D (Pp dr))), 


((-q D 7p) D ((-g D p)D q)), 


and three additional axioms related to 


((O(p > g)) > (Ap 5 g)), 
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po P; 


(Ap > p) > Up). 


Rules of inference of the logic L are: 


1. substitution rule allowing to pass from any formula F to the result of substitution 
in F instead of a variable p of F by any formula G (denoted by F(p/G), or by 
F(G)), 

2. modus ponens rule which permits to pass from two formulas F and F D G to the 
formula G, and 

3. rule of necessity admiting to go from formula F to LIF. 


As ususal, by the logic of the calculus GL we understand the set of all formulas 
that can be deduced in the claculus GL and it is closed with respect to its rules of 
inference and call it the propositional provability logic of Gédel-L6éb, denoted also 
by GL. They say logic L2 is an extension of the logic L, if Ly C Lo (as sets). 

It is known [1, 2] diagonalizable algebras can serve as algebraic models for 
propositional provability logic. Interpreting logical connectives of a formula F by 
corresponding operations on a diagonalizable algebra D we can evaluate any formula 
of GL on any algebra 9. If for any evaluation of variables of F by elements of D 
the resulting value of the formula F on ®D is 1 they say F is valid on D. The set of 
all valid formulas on the given diagonalizable algebra D is an extension of GL [9], 
denoted by LD. So, if diagonalizable algebra D2 is a subalgebra of the diagonalizable 
algebra ©, then LD, C LD, i.e. LD> is an extension of LD. 

Taking into account relations (1) we have the next one: 


LB. > LB3D---D Bry D... (2) 


2.3 Iterative Post Algebras 


Since composition is not an algebraic operation, Mal’cev [8] suggested to consider, 
instead of classes of functions with composition, special algebras whose underlying 
sets are composition-closed sets of functions. A. I. Malcev called these algebras 
iterative and pre-iterative algebras and described them as follows: 

Given an arbitrary set A, denote by py the set of all n-ary functions that are 
defined on A and have values in A. Put P, = U, PY. The iterative Post algebra is 
the algebra P4 = (P4; ¢, 7, A, V, *) whose operations for an n-ary function f with 
n > | and an m-ary function g are defined as follows: 


(Cf), te ,Xn) = S (2, %3, sy Xn, X1), 


(TA), eeagOn) = S (x2, X1, 3, Despina) 
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(Af) (1, «+ +5 Xn—-1) = f (X15 X1, X25 +++ Xn-1); 
(Vf) (1, +++ Ang) = fX2, %3, +6) Xn41)s 
(f * 8), -- + Xntm—1) = F811, «- + Xm), Xm ++ Xm4n—1)- 


The pre-iterative Post algebra is the algebra P, = (Pa; ¢, 7, A, *). If the set A 
comprises k elements then we say that these algebras have finite rank k. 

The subalgebras of the algebras P4 and P% are referred to as the iterative and 
pre-iterative algebras respectively. A clone is a pre-iterative algebra containing all 
selectors e7 (x1, ..., Xn) = x;. At present, the clone theory is an important and popu- 
lar direction in universal algebra. However, in applications, for example, in automata 
theory it is most natural to use iterative algebras, since such an algebra, containing a 
function f, also contains all functions that differ from f in fictitious variables. 


2.4 Relations on Diagonalizable Algebras 


As usual [5] a relation R(x, ..., Xm) ona diagonalizable algebra D = (D; &, V,D 
, a, U) is any subset of D”. They say the term function f (x1, ...,X,) conserves the 
relation R(x, ..., Xm) on the algebra D if for any elements aj; € D(i = 1,...,m, 
j =1,...,n) for which the following relations hold 


R(au, sy Qm1); ty R(Qin, tty Qmn) 
we have also the relation 


R(f(au, eee) Qin); Be 49 f (Amt, siighioug OQmn))- 


We use symbols == to read it as “‘is defined as”. Also denote the term function 
((x D y)&(y D x)) by (x ~ y). Given a finite diagonalizable algebra 9 and a rela- 
tion R(x, ..., Xm) on it we can associate a matrix M to it consisting of all elements 
aj; of D such that R(@1j,....0,;)» Where j = 1,...,k. 


Example 2 Consider relation Nx = Cy on 8. The corresponding matrix to it on 


B> is 
(ee 
Op0polal 
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3 Preliminary Results 


3.1 Representation of 4-Valued Functions by Terms of 
Diagonalizable Algebra Bz 


As was shown in Example | not every function may be represented by a diagonaliz- 
able term on diagonalizable algebra 83. 

Conditions for an arbitrary 4-valued function to be represented by a term of the 
diagonalizable algebra 5 are determined in the next statement. 


Theorem 1 A 4-valued function f(x1,...,Xn) can be represented by a term 
t(X1,..-, Xn) of the diagonalizable algebra Bz if and only if f conserves the relation 
RB2(x, y) (i.e. relation Ox = Oy) on Bo. 


Proof Necessity. It can be easily verified the terms (p&q), (p V q), (Pp D 4g), (=p) 
and (Cp) conserve the relation Ox = Cy on the algebra 82. Since any term f is 
directly expressible by them, and, so, the term f itself must also conserve the same 
relation on Bp. 

Sufficiency. Let us to note that any element of the algebra 5» corresponds to a 
constant term in 85, so, in the sequel we denote the elements of the algebra By and 


the constant terms of 5 by the same symbols. Suppose the function f(p1,..., Dn) 
conserves the relation x = Uy on the algebra 82. We will show in the following 
how to design the term ¢(p), ..., Pn), Which represent the function f on the algebra 
SB. 

Examine an arbitrary fixed set a= (aqj,...,Q,) of elements of 2. Let 
f(a1,...,@,) = 6 and consider the term (&/_, El (pi ~ aj))&d denoted by 
C°(p1,.--; Pn). It can be verified that C® satisfies the following conditions: 

é, if pj =a;,i=1,...,n 
C*(pi,---5 Pn) = 4 HO0&o, if Vi : Op; = Ua;, and Aj : pj 4 aj, 
0, if aj: Op; # Oa, 


Denote with I the set of all ordered sets of 4 elements from the set {0, p, a, L}. 
Consider the term 


His sces Pa) = \f © Wiaiins Pr) (3) 


yer 


Let us show the term f is the thought for one. To prove this it is sufficient to convince 
ourselves that t[a;,...,Q@,] = f(a), ..., @,) since the set of elements a is taken 
arbitrarily. The relation (3) can be rewritten as: 
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(uiccpy= VP OC Wisisy piv 


yer, yaa 


Wy OO Visiang pV (4) 


ye! ai: Uy; a; 


VP Oi ess ad: 


aFy Uy =Uaj 


The last relation (4) implies, taking into consideration the properties of the formula 
C°, the following equality: 
tlay,..., Ay] =C° (ay, ..., An)V 


Vo Can... @n)V 


yer ai: Oy; Qj 


ll eel | Sam 


dV0V (b6&o) = 6 = f(ay,..., Qn). 


Hence, for an arbitrary set of elements a € I’ we have 


tla1,..., Qn] = f(a1,..., On). 


So, the term ¢ represents the function f on the algebra Bp. 
The Theorem | is proved. 


Definition 1 Two terms f, and f) are said to be equivalent on the diagonalizable 
algebra D if they represent the same function f on D. Terms ¢; and f are reffered 
to be distinct on if they are not equivalent on D. 


The next statement is a consequence of the above theorem. 
Proposition 1 There are exactly 64 distinct unary terms on the algebra B. 


In order to describe the distinct unary terms of the algebra 82 we use Table 1, 
where J;;(p) (i = 1,...,8; j = 1,..., 8) denotes the unary term which for p = 0 


Table 1 Distinct unary terms of 82 
Pp 
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and p = p takes values from the i-th column, and for p = o and p = 1 it takes 
values from the j-th column. 
For example, /); = 0, lig = p, 3 = 7p, Isg = Up, lig = Llp, and gg = 1. 


4 Important Properties of Some Sequences of 
Diagonalizable Algebras 


Next statement, which establish an isomorphic property between algebras and sub- 
algebras of the algebras 8; and 8;+1. 


Proposition 2. Consider mapping ¢ : 8; — %8; +, constructed according to the rule 
O((a1, Q2,...,Q;)) = (A), Q2,..., A;, aj). The mapping ¢ represents an isomor- 
phic embedding of the algebra 8; into the algebra Bj+1. 


Proof Itis clear the mapping ¢ represents an isomorphic embedding of the boolean 
algebra 8; into the boolean algebra 8; +. Consider element (a), ..., a;) of Bj. it 
remains to prove that: 


o(U(a, ..., a7)) = Ob((ay,..., a%)). (5) 


According to definition of the operation UH on the algebras 8; and 8+, and in virtue 
of the properties of the mapping ¢ we have: 


o(U(ar,..., a1)) = O(C1, a1, ..., a1) 


(6) 
= (1,q),...,a1, Q1) 
and 
P((Q1, A2,-.., Q-1, A)) = (ay, a2, ..., Aj-1, A, %) 
(7) 
= (1, a),...,Q1, 1, a1). 


Examining relations (6) and (7) we conclude that also holds equality (5). Proposition 2 
is proved. 


The diagonalizable algebra 8; can be also embedded into the diagonalizable 
algebra 8,, as next proposition states. 


Proposition 3 For any integer positive number i we have that algebra 8; is iso- 
morphic to a subalgebra of the algebra By. 


Proof Consider the mapping 7 : 8; — % , which assigns an element from %, to 
every element (a1, ..., @;) of 8; according to the rule: 


M(Qy,..., Qj) = (Q1,..-, AF, A, A, ..-) 
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Obviously the mapping 7 is an isomorphic embedding of the boolean algebra 8; 
into algebra 8. It remains to verify that next relation holds: 


n(U(ar,...,ai)) = Un((ar,..., a). (8) 


According to the definitions of the operation LJ on the algebras 8; and 8, and 
taking into account the properties of the mapping 77, we have: 


n(U(ay,..-, a) = (C1, a1, ..., a1) 
(9) 
= (1, aj,..-,01,Q4,...) 
and 
n((Q1, Q2,..., Qj-1, A)) = 
(Q1, Q2,..., Qj_1, Qj, Aj,-..) = (10) 
(1, a1,...,@1, @1, Q,...). 


Examining equalities (9) and (10) we draw the conclusion that the equality (8) holds 
too. The statement 3 is proved. 


Taking into account Propositions 2 and 3, and also the fact that to each diagonal- 
izable algebra corresponds an extention of the propositional provability logic GL 
we have 


Proposition 4 


GLCLBSx C---C LB C---C LB; C LB, (11) 


Proof Obviously. 


In the next lemma we show that there are finitely many distinct terms of n variables 
in the diagonalizable algebra Bo. 


Lemma 1 The diagonalizable algebra Bo. is n-equal to the algebra Bon41. 


Proof Really, since algebra 84; is isomorphic to a subalgebra of the algebra 8, 
it follows the number of distinct terms of n variables in Bn, is less or equal to the 
corresponding number of distinct terms of 7 variables in Bo. 

Let us note that two diagonalizable algebras are n-equal if and only if the corre- 
sponding propositional provability logics have the same distinct non-equivalent two 
by two formulas containing at most n variables. 

The inclusion L538.) © L844; follows from the relation (11). Suppose an arbi- 
trary formula F,, containing only variables p;, ..., Py, satisfies relation F € LBon41. 
Suppose F ¢ L8... Then, according to relation (11), there is an integer and posi- 
tive number i such that F ¢ L56;. Subsequently, there is an ordered set of elements 
(Q1,..., Q@,) from the algebra 8; such that 
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F(pi/ai, e+ +5 Pn/Qn) # 1. 


Consider in the algebra 8; the boolean subalgebra 8 generated by the elements 
Q|,..., Q,. Itis clear B is isomorphic to some boolean subalgebra 8 ; from the list 
(1). Since the boolean algebra 8; is generated by n generators, the number of its 
elements is not greater than number of elements of the free boolean algebra with n 
generators, which, according to [4], is equal to 2”, i.e it is equal to the number of 
elements of the algebra 8>.. It is also known the algebra 8» has 2” atoms. Denote 
by A the set of the elements of the algebra 8. If the element p € A then the boolean 
algebra By. is closed relative to operation U1. Suppose p ¢ A. Then the set of elements 
{p} U A generates a boolean subalgebra 8’ which is isomorphic to the algebra Ban +1, 
that is closed relative to the operation LJ. Thus, by the force of the relation (11), we 
have LB>.,, C LS’. Taking into account (2), the last relation implies F ¢ L841. 
The last statement contradicts our initial supposition that F € L8.,,. This means 
that F ¢ LS... Considering the note at the beginig of the proof we conclude the 
Lemma | is proved. 


Lemma 2 /f the diagonalizable algebra D, is n-equal to diagonalizable algebra 
D> then any term expressible via system of terms X on Dj is also expressible via X 
on D>. 


Proof Suppose term f (pi, ..., Pn) iS expressible on‘, via system of terms &. Then 
should exists a representation of f via & on ; as on D2 since they are n-equal. 
Lemma is proved. 


Next lemma presents us a property of the constants connected to their represen- 
tativity in the diagonalizable algebras 8, and Bp. 


Definition 2 Suppose & is some set of terms of a diagonalizable algebra D. Any 
term of & is term over &. If to(x,,..., x,) is term from & and t,,...,¢,) are either 
variables or terms over &, then term f(t}, ...,¢,) is a term over &. We also say in 
this case the term ¢ is expressibile via terms of © on D. 


Definition 3 A term t(x,,...,x,) of the diagonalizable algebra D is reffered to as 
a constant (term) of © if there is an elements a € D sunch that for any ordered set 
of elements (a, ..., @,) of © they get t(a1,...,Q@,) =a. 


Lemma 3 Any constant that is expressible via system of terms & on the diagonal- 
izable algebra Bq is also expressible via & on the algebra Bo». 


Proof Suppose t (x) is a constant and is expressible on the algebra 82 via a system 
of terms ©. By Lemma | we have 8 =p 8. Subsequently, the term f(x) is also 
expressible via & on 8.5. Lemma 3 is proved. 


Next statement is a consequence from the last Lemma 3. 


Corollary 1 The diagonalizable algebra 8.x. contains only four distinct constants. 
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5 Results 


5.1 On Existence of an Algorithm for Determination of 
Functional Expressibility in Local Tabular 
Diagonalizable Algebras 


Let start with some definitions and results formulated and established by A. V. 
Kuznetsov in the case of superintuitionistic logics [5] adapted here to the case of 
diagonalizable algebras. 

They say term t is detachable from the system of terms & on the diagonalizable 
algebra D if there is arelation R such that all terms of = conserves R on D, and t does 
not conserve R on D. The diagonalizable algebra D is said to be finitely approximable 
(relative to expressibility of terms), if for any term t and for any system of terms & 
we have that if ¢ is not expressible via & on D implies existence of an embedable in 
®D finite diagonalizable algebra 8, and implies at the same time the existence of a 
relation R on % such that term f is detachable form the system & by the relation R 
on the algebra B. 

The diagonalizable algebra D is said to be n-ary tabular, if there is a finite diag- 
onalizable algebra & such that D =, %8. For example, diagonalizable algebra B., 
as follows from Lemma |, is n-ary tabular. 


Lemma 4 The termt is detachable from the system of terms X on the diagonalizable 
algebra 8 by some relation R if & is finite and term t is not expressible via X on 
the algebra B. 


Proof Suppose the support of the diagonalizable algebra % contains only ele- 
ments (),..., 9, and let t be some term containing n distinct variables. Con- 
sider all possible terms f,, ..., f; on 98, which depend only on variables x1, ..., Xn, 
each of them represented by a term that is expressible via & (J <k”,m =k"). 
Denote by 6; G@ = 1,...,m; 7 =1,...,/) the value f)(G;,,..., G;,), where Gj, € 
{G,,..., O},r =1,...,k. Note that all terms of & conserves the matrix consist- 
ing of elements §j; (i = 1,...,m; j = 1,...,/). The relation corresponding to this 
matrix is the thought for relation R from the statement. Lemma is proved. 


Theorem 2 Any local tabular diagonalizable algebra is finitely approximable rel- 
ative to expressibility of terms. 


Proof Let D be some local tabular diagonalizable algebr and let ¢ be any n-ary term 
that is not expressible via & on D. Due to local tabularity of D there exists a finite 
diagonalizable algebra 5 such that D =, %. Since ¢ is not expressible via & on 
®, taking into account 2, t also is not expressible via & on 8. Then, according to 
lemma 4, there is a relation R such that t is detachable from © by R on 8. Theorem 
is proved. 


They say the diagonalizable algebra © is decidable relative to expressibility of 
terms, if there is an algorithm, which for an arbitrary term ¢ and for any system of 
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terms & gives us the answer to the question whether ¢ is expressible on via U 
or not. The diagonalizable algebra D is called enumerable relative to expressibility 
(terms), if there exists an algorithm which for any term g and for any list of terms 
&, gives us a positive answer to the question whether g is expressible via & on D in 
the case such expressibility exists. 

The pair < I, T > is called expression of the term t via system of terms X from 
term h, it T is a finite sequence of terms the last element of which is the term ¢, and 
each term of the sequence is either a variable, or belongs to &, or is obtained from 
previous elements of the sequence applying one of the following rules: (a) weak 
rule of substitution, and (b) the rule of replacement by an equivalent term on the 
diagonalizable algebra endowed with an additional equation h = 1. 


Definition 4 They say a diagonalizable algebra D is generated by the term h if 
identity h = 1 holds on D. 


Theorem 3 /f some diagonalizable algebra D is finitely approximable relative to 
expressibility of terms then it is decidable relative to expressibility of terms. 


Proof The design of the desired algorithm, which will allow to decide for any term 
t and for any system of terms & whether ¢ is expressible via & on D consists 
of two algorithms working in parralel mode. The first algorithm, which existence 
results from Theorem 2 is used to obtain a positive answer to the question. The 
second algorithm, used to obtain a negative answer, is designed based on term h, 
which determines the diagonalizable algebra generated by the term h, and consist in 
enumeration in an established order of all non-isomorphic diagonalizable algebras 
21, endowed with an aditional relation P, by searching such a relation P that h = 1 
is valid on 2, and term ¢ is detachable from & on 2 by relation P. 

Suppose diagonalizable algebra D is finite approximable relative to expressibility 
of terms. Then 9 is generated by term h. So, the term f is expressibile via X on D 
if and only if there is an expression of term ¢ via & from h. 

It is not impossible to decide in an algorithmic way for an arbitrary term ¢, for 
an arbitrary term h, for an arbitrary system of terms & and for any pair of finite 
sequences of terms if this pair of sequences is an expression for term ¢ via & from 
term h. Therefor the looked for algorithm can be one that for any fixed term h executes 
next steps: for any term f and for any list of terms & it have to examine all possible 
finite sequences of terms in an established order (for example, the established order 
can be determined by the corresponding Gédel’s numbers) and to verify for any pair 
whether it is an expression of the term ¢ via & from term / or not. 

Theorem is proved. 


Theorem 4 /f diagonalizable algebra is local tabular and is finite approximable 
then it is decidable relative to expressibility of terms, and also is decidable relative 
to completeness as to expressibility of terms. 


Proof The proof of this theorem results in essence from the previous theorem taking 
into account the term / from the description of the algorithm in the above theorem 
implies the diagonalizable algebra is finitely approximable. 
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Definition 5 A diagonalizable algebra is local tabelar if for any positive and 
integer n there is finite diagonalizable algebra 2 such that © =, 2. 


It is worth to mention here a relative recent result of professor Mefodie Rata, who 
established the indecidability of the problem of expressibility for the propositional 
provability logic GL [12], which translated into the world of diagonalizable algebras 
states the following. 


Theorem 5 [1/2] Let be a diagonalizable which is not local tabular. Then it is 
indecidable relative to expressibility of terms, i.e. there is no algorithm to decide 
whether a given term t is expressible via a system of terms X& on D. 


Remark 1 The last theorem do not imply the problem of completeness relative to 
expressibility of terms in non-local tabular diagonalizable algebras is also indecid- 
able. 


5.2 Completeness Relative to Expressibility of Terms for an 
Infinite Sequence of Diagonalizable Algebras 


Two diagonalizable algebras D; and Dz are called equal relative to expressibility of 
terms if a system of terms is complete relative to expressibility of terms on D, if and 
only if it is complete relative to expressibility of terms on D2 (see in case of logics 
[5, 13, 14]). Next theorem establishes an equality relation relative to completeness 
as to expressibility of terms between algebras 5, and Bs. 


Theorem 6 The diagonalizable algebra 8, is equal relative to expressibility of 
terms to the diagonalizable algebra Bs. 


Proof Suppose the system of terms ¥ is complete relative to expressibility of terms 
on the diagonalizable algebra 8... Then, taking into account relations (2), the system 
& is also complete as to expressibility of terms on the algebra 85. Now suppose © is 
complete on $5. Then according to Lemma | there are terms m(x), n(x) and c(x, y), 
which are represented via & such that the following identities hold in Bs: 


x=m(x), 7x =n(x), x&y) =c(x, y). 


Since by Lemma | the diagonalizable algebras 8. and 85 are 2-ary equal, then 

those three identities are also valid on the algebra 8... Subsequently, any term of 

the system {x&y,x V y,x D y, ax, Lx} is expressible via terms of &, and thus & 

is complete relative to expressibility of terms on Bq. 
The theorem is proved. 


As a consequence of the Theorem 6 we get statement 2. 


Corollary 2. The diagonalizable algebra 85, is tabular relative to completeness as 
to expressibility of terms. 
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The diagonalizable algebra D is called decidable relative completeness as to 
expressibility of terms, if there is an algorithm that allows to decide for any finite 
set of terms & whether © is complete on D or not (in case of logics see [5, 13, 
14]). It is known that all tabular algebras are decidable relative to completeness as 
to expressibility of terms [5]. Therefore, taking into account statement 2, we get 


Theorem 7 The diagonalizable algebra 8 . is decidable relative to completeness 
as to expressibility of terms. 


This Theorem 7 implies only the existence of the thought for algorithm, which 
solves the problem of completeness relative to expressibility of terms on the algebra 
$840. Its design consists from two algorithms working in parallel mode. One of them 
examines all possible sequences which states the expressibility of some terms or 
other ones via the system of terms © on S$. until we obtain all terms of the system 
{-x, Ux, x&y}. The second algorithm looks for a negative answer for the same 
question considering relations on the algebra 85 in an appropriate order such that 
the system & conserves this relation on algebra Bs, and not all terms of the system 
{>x, Ox, x&y} conserve it on 85. The presented algorithm is very complex. So, is 
actual the problem to find a better algorithm to solve the problem completeness as 
to expressibility of terms on By. 

A possible approach to this problem is to examine the problem in the case of 
more simple diagonalizable algebras 82, 83, 84, 85, which are subalgebras of the 
algebra B50. 


5.3 Conditions for Expressibility of Constants in 8B. 


Consider the following relations on 2 (read symbols “==” as “defined by”): 


(1) Riv) == (Ox = 0); 

(2) Ro(x) == (Ox = 1); 

(3) R3(x) == Ihs(x) = x); 

(4) Ra(x) == Ig(x) = x); 

(5) Rs(x) == I45(x) = x); 

(6) Re(x) == Iyg(x) = x); 

(7) R7(x) == Ihs(x) = x); 

(8) Rg(x) == heg(x) = x); 

(9) Ro(x) == I6(x) = x); 
(10) Rio) == Iyo(x) = x); 
(11) Ri, y) == C37) = y); 
(12) Rio, y) == (Ox € Oy); 


We denote by 92; the corresponding matrix to the relation R; on the algebra By 
and denote with IJ; the class of all terms, which preserves the relation R; on the 
algebra Bo, i.e. the class of all terms, which conserves the matrix Jt; on 82 for any 
i=l,...,12. 
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Table 2. The class of terms 


‘ : The class Defining matirx 

and the corresponding matrix 

Ty (0p) 

TI (ol) 

ITs (0c) 

TI4 (01) 

Is (po) 

Tl¢ (p1) 

7 (Opa) 

ITg (Op1) 

To (Oo1) 

Tho (pol) 

ms (2895) 

Ts (Fistor) 

The Table 2 presents the list of all classes IT,,..., 12 and their corresponding 

matrix. 
Theorem 8 Suppose the terms f\,..., f12 do not preserve the corresponding rela- 
tions R,,..., Riz on the diagonalizable algebra By. The constants 0, p,o, 1 are 
expressible on By via terms fi, ..., fir. 


The proof of the theorem follows from the next 5 lemmas. 


Lemma 5 The term a(p), where 
a[O] € {o, 1} (12) 
is expressible on Bp via term f\. 


Proof Really, the term f; does not conserve the relation R; on the algebra 85. Then 


there exists an ordered set of elements (aj, ..., @,) from %8 such that 
a; € {O,p} @=1,...,n) (13) 
filai,---, Qn] € {a, 1} (14) 
Since f; conserves the relation Ox = Uy on the algebra 82, in view of relations 


(13) and (14) we also have that 


filO,..., 0] € {o, 1} (15) 


Let a(p) = filpi/p,---; Pn/p]). In virtue of (15) we obtain a[0] € {c, 1}. 


272 A. Rusu and E. Rusu 
Lemma 6 The term b(p), where 
b[1] € {0, p} (16) 


is expressible on Bp via fo. 


Proof The validity of Lemma 6 follows from the fact that its formulation is dualistic 
to the formulation of Lemma5 with respect to —x, where term b is considered in 
place of the corresponding term a. 


Lemma 7 Let the terms a and b satisfy the relations (12), (16) and 
blo] = b[1]. (17) 


Then at least one of the constants 0 or p is expressible via terms a, b and f\2 on Bo. 


Proof Let b satisfies the relations (16) and (17). Then two cases are possible for the 
term b: (1) b[O] € {0, p}; (2) b[O] € {c, 1}. Let us observe in the first case the term 
b[a[b(:p)]] represents one of the constants 0 or p. 

Consider case (2), i.e. b[O] € {o, 1}. Consider term f)2, which does not preserve 
R {2 on $2. Then there are two ordered sets of elements (a, ..., @,) and (G1, ..., Bn) 
from the algebra 8 such that 


a, OG @ =1,...,n) (18) 
fralog, .--, Om) =O fil, ..-, Br] (19) 


We build the term d(p1,..., ps) = fiz[di,..., dn], where for every i = 1,...,n 
di(pi,--., Ps) = pi,ifa; = 0, 6; = 0, 
di(pi,.+-, ps) = p2, if a; = 0, 6 = 1, 
di(pi,.--, ps) = p3,ifa; = p, Bj = 0, 
di(pi,.--, ps) = ps, ifa; = p, Bj = 1, 
di(pi,..., Ps) = ps,ifa;i =o, §; = 0, 
d\(pi,---; Ps) = Po, if a; = 0, 2; = p, 
dj(pi,.--, Ps) = p7,ifa; = 1, 6; = 0, 
di(pi,..., Ps) = ps, ifa; = 1,5; =p 
(by the power of relation (18) other cases are impossible). It is clear that d;[O, 0, p, 


p,o,0,1,1] =a; andd;[o, 1, 0, 1, 0, p, 0, p] = G;. Then, taking into account the 
design of the term d, the relation (19) and the last equalities, we obtain 
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d[0, 0, p, p,0,0, 1, 1] - OOppooll (20) 
d{[o,1,¢,1, 0, p, 0, p] } ~ \ Op0polol 
Consider now the term D*(p, gq) = D[p, p, p. p. 9.9.49, q]. By (20) and the fact 
that d conserves on the algebra > the predicate Ox = Uy, we obtain 
d*(0, 1] OOppoo11 
Cc 
Ga a - ee e) 


Let us examine the term d’(p, q), defined by the scheme 


' _ J 4*(p,q), if d*[0, 1] € {0, p}, 
ong) = eee if d*[0, 1] € {o, 1}. 


By power of the relation (17) and taking into consideration (21), the term d’ satisfies 
the inclusion {d’[0, 1], D’[1, 0}} C {0, p}. Therefore, in the second case, taking 
into consideration (16), the relation b[d’[p, b(p)]]] € {c, 1} holds. Hence, on the 
basis of the conditions (16) and (17), the term b[b[d'[p, b(p)]]] represents one of 
the constants O or p. 


Lemma 8 Let terms a and b satisfy the relations (12), (16) and 
blo] € b[1]. (22) 


Then at least one of the constants 0 or p is expressible via terms a, b, f3, fi, fir, fiz 
on the algebra Bp. 


Proof The relation (16) and the fact that b conserves the relation Ox = Cy on 
the algebra 8» implies that there are two possible situations: (1) b[1] = p, and (2) 
b[1] = 0. 

Let us consider the first case. On the basis of the relation (22) we have that 
b[o] = 0. We consider the term f3. Since it does not conserve R3 on 8 there exists 


an ordered set {a1,..., @,} of elements of 8». such that 
a, €{0,o}, i=1,...,n (23) 
falar, ...4 Qn] € {pL}. (24) 
We design the term e(p) = fs[e1,..., én], where for everyi = 1,...,n 
of b(p), ifa; = 0, 
cto) = {7 ifa; =o 


(in accordance to (23) other cases are impossible for the elements a; ). The term e is 
direct expressible via terms f3; and b. Obviously, e;[o] = a; and the view of relation 
(24) we have e[o] = fslei[o],.--, enlol] = falar, ..-, Qn] € {p, 1}. Consider the 
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term e*(p), defined by the scheme 


5 _ je(p),  ifelo]=p, 
ep) = a if eo] = 1. 


The term e*(p) is directly expressible via terms b and e and satisfies the condition 
e*[a] =p. (25) 


Two sub-cases are possible: (1.1) e*[1] = p and (1.2) e*[1] = 0. In the sub-case 
(1.1) the term e*(p) satisfies analogous conditions to (16) for the term b(p) from 
Lemma 7 and then the proof will follow the corresponding proof of the Lemma 7, 
thus one of two constants O or p is obtained. 

Consider now the sub-case (1.2) when e*[1 = 0. Consider term f7. Since F7 does 
not conserve the relation R7 on %82, then there exists an ordered set of elements 
(G1, ..-, Bn) from %82 such that 


B €{0,p,o}, i=1,...,n (26) 
fl, --- rl = 1 (27) 
Take the term h(p) = f7[N1,..., 4], where for every i = 1,...,7 


hi(p) = b(p), if Bj = 0, 
hi(p) = e*(p), if Bi = p, 
hi(p) = p, if 6; =0. 


(obviously, other cases are missed for the elements (3; ). The term h is directly express- 
ible via f7, b and e”*. It is clear hj[o] = G; and in agreement with relation (27) we 
have 

h{o] = 1. (28) 


If A[1] = 1 then the term b[h(p)] satisfies analogous conditions to conditions 
(16) and (17) for the term b from Lemma 7. That is why we can obtain one of the 
constants O or p in the case when h[1] = 1 in the same way as in Lemma 7. 

Let h[1] = o. Use the formula f),. It follows from its properties that there exist 
two ordered sets of elements (7, ...,Y%) and (61, ...,6,) from 82, such that the 
next relation holds 

Iy[y;]=6;, i=1,...,0 (29) 


Taking also into consideration the Theorem | we have 


fully. --+>%l = fiuldr,..., dnl. (30) 
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Design the term j(p) = fiiLji(p), ---, jn(p)], where for any i = 1,...,n we get 


b(p), ify = 0, 6; = p, 
- = e*(p), if +; = Pp, 6; = 0, 
Jip) = DP, ify, = 0,6; = 1, 
b(p), ify =1,6,=0 


(by properties of the relation (29) the elements 7; and 6; do not take other values). 
j(p) is directly expressible via b, e*, H and f\;. Let us notice that j;[o] = 7, 
jill] = 6; and, hence, by relation (30), we obtain j[7] = j[1]. So, the term j*(p), 
defined by the scheme 


+ y= [iP if [1] € {0, p}, 
bLi(p)I], if j[1] € {o, 1}, 


satisfies the relations 
jlo] = j{1], s[1] € {0, p} (31) 


Let us notice that conditions (31) are analogous to conditions (16) and (17) from 
Lemma 7. Hence, we can obtain in a similar way one of the constants 0 or p. So, the 
proof of the lemma (8) in the case (1) is finished. 

Let us consider the second case, when b[1] = 0. Examine the term /4. Since it 
does not conserve the relation R4 on 8», then there is an ordered set of elements 
(€],..-,€n) on B> such that 


e, € {0,1}, i=1,...,n (32) 
faleter seven) S12) (33) 
Design the term s(p) = fals1,..., 5,], where for any i = 1,...,” we have 
p) — | OP), ther = 0, 
stp) = | ife; = 1. 


(by properties of (32) we do not have other cases). The term s is directly expressible 
via f, and b. Obviously s;[1] = ¢; and in agreement with relation (33) we have 


s[1] = falsi [1], tae » Sn[1]] = Sales, tees En] € {p, o} . 


Consider the term s*(p) defined by the scheme 


xy \ _ J 5(p), if s[1] =p, 
a Picea if s[1] =o. 
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The term s*(p) is directly expressible via b and s and verifies the condition s*[1] = p. 
Taking into consideration the Theorem 1 we also have the relation s*[o] € {0, p}. If 
s*[o] = p, then we obtain one of the constants 0 or p as in the case (1). It remains to 
consider the case when s*[o] = ©. But in this case we are already under conditions 
of the first case, which was successfully considered ealier above. 


Lemma 9 All constants 0, p, 0, 1 are expressible on diagonalizable algebra 8 via 
terms f;, i = 3,..., 10, via any unary terms a and b, which verify the corresponding 
conditions (12) and (16), and via any constant 0 or p. 


Proof Let us convince ourselves that one of the following systems of terms (34) is 
expressible via one of the constants 0 or p and the formula a: 


{0,o}, {0,1}, {9,0}, te, 1}. (34) 


Let us consider the constant 0. By properties of (12) we have a[O] € {o, 1}, which 
means we have at least one of the first two systems of (34). Suppose we have the 
constant p. Then by Theorem | we have a[p] € {c, 1}, and by the similar reasons as 
in the case of the constant 0 we can conclude analogously we have at least one the 
the last two systems of (34). 

We wil show in the following that via every system of terms of the list (34) and 
via corresponding terms 3, f4, fs, f6 is expressible one of the following systems of 
constants 

{0,p,o}, {0,p, 1}, {0,0, 1}, {p,0, 1}. (35) 


Let us consider the system of terms {0, 7}. Examine the term /3. Obviously via 
f3 and constants 0 and o is expressible some term f; (p,q), which satisfies the 
condition f;[0, 0] € {p, 1}. Hence, we obtain one of the systems of terms {0, p, 7} 
or {0, 0, 1}. Ina similar way we obtain: 


e the system {0, p, 1} or the system {0, o, 1} via {0, 1} and fy; 
e the system {0, p, o} or the system {p, o, 1} via {p, o} and fs; 
e the system {0, p, 1} or the system {p, a, 1} via {p, 1} and fo. 


In a similar manner we obtain that all constants of the system {0, p, 0, 1} are 
expressible on 52 via every system of formulas of (35) and corresponding terms 


Si, fs, fo, fio- 


Now we ready to formulate the conditions for expressibility of constants on diag- 
onalizable algebra Bo. 


Theorem 9 Suppose the terms f,,..., f\2 do not preserve the corresponding rela- 
tions R,,..., R\2 on the diagonalizable algebra 8. The constants 0, p, 0, 1 are 
expressible on Bo. via terms fi,..., fir. 


Proof Really, taking into consideration Lemma 3 and the above proved Theorem 9, 
we can conclude the theorem holds. 
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6 Conclusions 


Results presented in the paper can be interpreted from at least three points of view: 
logic view (with each diagonalizable algebra we can associate a propositional prov- 
ability logic), Post interative algebras view (expressibility of terms can be interpreted 
as corresponding algebraic Mal’cev operations), category theory (clones of opera- 
tions viewed in context of category theory). 

To find out all necessary and sufficient conditions for a system of terms to be com- 
plete relative to expressibility of terms on diagonalizable algebra 8. it is necessary 
and sufficient (according to Theorem 6) to establish the corresponding necessary and 
sufficient conditions on diagonalizable algebra 85, which contain 32 elements. 

It is also possible to examine other possibilities to obtain diagonalizable terms 
from other ones. For example, one may consider expressibility of terms related to 
primitive positive clones [3]. 


References 


1. Bernardi C.: Modal diagonalizable algebras. Bol. Unione Mat. Ital. 15-B(1), 15-30 (1979) 
2. Blok, W.J.: Pretabular varieties of modal algebras. Stud. Log. 39(2-3), 101-124 (1980) 
3. Burris, S., Willard, R.: Finitely many primitive positive clones. Proc. Am. Math. Soc. 101, 
427-430 (1987) 
4. Gratzer, G.: Universal Algebra. Springer Science & Business Media (2008) 
5. Kuznetsov, A.V.: On functional expressibility in superintuitionistic logics (Russian). Matem- 
aticheskie Issledovaniya 6(4), 75—122 (1971) 
6. Kuznetsov A.V.: On detecting non-deducibility and non-expressibility. In: Locical Deduction, 
Nauka, Moscow, pp. 5-33 (1979) 
7. Magari, R.: The diagonalizable algebras. Boll. Unione Mat. Ital. 12(3), 117-125 (1979) 
8. Mal’cev, A.L.: Iterative algebras and Post’s varieties (Russian). Algebra i Logika (Sem) 5(2), 
5-24 (1966) 
9. Maksimova, L.L.: Continuum of normal extensions of the modal logic of provability with the 
interpolation property. Sib. Math. J. 30(6), 935-944 (1989) 
10. Post, E.L.: Introduction to a general theory of elementary propositions. Am. J. Math. 43, 163- 
185 (1921) 
11. Post, E.L.: Two-Valued Iterative Systems of Mathematical Logic. Princeton (1941) 
12. Rata, M.F: A formal reduction of the general problem of the expressibility of formulas in the 
Godédel-Loéb provability logic. Discrete Math. Appl. 12(3), 279-290 (2002) 
13. Ratsa M.F.: On functional completeness in modal logic $5 (Russian). In: Investigations on 
Non-classical Logics and Formal Systems, Moscow, pp. 222-280 (1983) 
14. Ratsa, M.F.: Expressibility in Propositional Calculi (Russian). Stiinta, Chisinau (1991) 
15. Solovay, R.M.: Provability interpretations of modal logic. Israel J. Math. 25, 287-304 (1975) 


An Algorithm for Solving the Sylvester ®) 
s-Conjugate Elliptic Quaternion Matrix cre 
Equations 


Murat Tosun® and Hidayet Hiida Késal® 


Abstract In this study, the existence of the solution to Sylvester s-conjugate ellip- 
tic quaternion matrix equations is characterized and the solution is obtained in an 
explicit form by means of real representation of an elliptic quaternion matrix. More- 
over, a pseudo-code for our method is presented. Actually, Sylvester conjugate matrix 
equation over the complex field form a special class of Sylvester s-conjugate elliptic 
quaternion matrix equations. Thus, the obtained results extend, generalize and com- 
plement the scope of Sylvester conjugate matrix equations known in the literature. 


Keywords Elliptic quaternions and their matrices - Real matrix representations - 
Sylvester conjugate matrix equation 


1 Introduction 


Sylvester matrix equation AX — XB = C play a role in control theory, signal pro- 
cessing, filtering, image restoration, decoupling techniques for ordinary and partial 
differential equations, and block-diagonalization of matrices, [1-6]. In [7], Jameson 
investigated the Sylvester matrix equation by means of characteristic polynomial, 
and derived explicit solution of the Sylvester matrix equation. In [8], Sauza and 
Bhattacharyya established a closed-form finite series representation of the unique 
solution of the Sylvester matrix equation. In [9], Dehghan and Hajarian developed 
an efficient iterative method for solving the second-order Sylvester matrix equation 
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EV F? — AVF —CV = BW. In[10], Wuet al. studied the closed-form solutions 
to the generalized Sylvester-conjugate matrix equation. In [11], Zhou discussed 
a gradient-based iterative algorithm for solving general coupled Sylvester matrix 
equations. In [12], Song and Chen established some explicit solutions to the quater- 


nion matrix equations X F — AX = C and XF — AX =C (where X denotes the 
j conjugate of the quaternion matrix) by means of Kronecker map and complex 
representation of a quaternion matrix. 

Ina literature, the matrix equation AX — XB = C, where X denotes the complex 
conjugate matrix of complex matrix X, attracted considerable efforts. Also, this equa- 
tion called the Sylvester-conjugate matrix equation. In [13], Bevis et al. established 
a necessary and sufficient condition for the existence of a solution to the Sylvester- 
conjugate matrix equation by the consimilarity of two partitioned matrices related to 
A, B and C. In [14], Bevis et al. obtained solution of the Sylvester-conjugate matrix 
equation in the case where the matrices A and B are both in consimilarity Jordan 
form. In [15], Wu et al. proposed an explicit solution for the Sylvester-conjugate 
matrix equation with the help of the real representation of complex matrix. This 
solution was neatly expressed by a symmetric operator matrix, two controllability 
matrices, and two observability matrices. In [12], Song and Chen studied generalized 
Sylvester-quaternion matrix equation X F — AX = BY and Sylvester- j-conjugate 
quaternion matrix equation X F — AX = BY. In [16], Jiang and Ling studied the 
problem of solution of the quaternion matrix equation AX — XB = C withthe help 
of real representation of a quaternion matrix. Also, they derived a similarity version 
of Roth’s theorem for the existence of solution to the quaternion matrix equation 
AX — XB = C, and obtained closed-form solutions of the quaternion matrix equa- 
tion AX — XB =C in explicit forms. As a special case, this paper also gave some 
applications of the solutions of complex matrix equation AX — XB = C and of the 
consimilarity of complex matrices. In [17], Kosal and Tosun investigated the prob- 
lem of solution of the sylvester conjugate commutative quaternion matrix equations 
A’ X — XB =C (fors = 1, 2, 3) by means of real representations of the commuta- 
tive quaternion matrices. 

In this paper, we consider the problem of solution of the sylvester s— conjugate 
elliptic quaternion matrix equations A°X — XB = C (for s = 1, 2, 3) by means of 
real representations of the elliptic quaternion matrices where * X denotes the matrix 
whose entries are the s”’ conjugate of the entries of elliptic quaternion matrix X. 
Actually, Sylvester conjugate matrix equation over the complex field is a special case 
of Sylvester s— conjugate elliptic quaternion matrix equations. Thus, the obtained 
results extend, generalize and complement the scope of Sylvester conjugate matrix 
equations known in the literature. 
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2 Elliptic Quaternions 


The set of elliptic quaternions is denoted by 


A, = {a=ang+ayit+aj+ask: ay, a, &, a3€ Randi,j,k¢R} (1) 
where #=k* =a, jf? =1, ij =ji=k, jk =kj =i, i =ik=aj, a <0 
and ae€ R, [18]. It is clear that multiplication in H, is commutative. Addi- 
tion of the elliptic quaternions a = a) + ajit+a.j +a3k, b=bo+bit+bajt+ 
b3k € H, is definedasa + b = (ag + bo) + (a1 + biit+ (+ bo) fj + (a+ b3)k 
. Scalar multiplication of a elliptic quaternion a € H, witha scalar \ € R is defined 
as Xa = A (ap + Qi + Goj +.3k) = Xag + AQyi + AG2j + Aa3k . In addition, the 
quaternionic multiplication of two elliptic quaternions a, b € H, is defined 


ab = (aobo + aayb, + agbz + a.a3b3) + (abo + agb) + a3bz + arb3)i 
+ (aobz2 + aobo + aayb3 + aa3b;) j + (a3bo + aob3 + ayb2 + arb) k. 
(2) 


Elliptic quaternion a € H, has three types of the conjugate: '@ = ay — ayi + 
a2j — a3k, a = ay — ai — aj +a3k and 3G =ay tai — az j — a3k. Moreover, 
the norm of a € H, is defined as 


all = Jia ('@) 2a) Pa) = W [Cao + a2)? — aan +.43)"] [Cao — a2)? - oar — a3)". 
(3) 


If a € A, and ||a|| 4 0 then a has multiplicative inverses. Multiplicative inverse 
1g) 22) Ga 
of a is given by a~! = LVBCHC 118), 


lla" 
Theorem 2.1 Leta = ay + aqyi + aoj + a3k € Hy. Then a and its conjugates sat- 
isfy the following Universal similarity factorization equality (USFE): 


ag AA), a2 AA3 
a, ao a3 a2 


. I~ 2— 3-) p-1 _ 
P diag (a, a, “a, a) P =I ose -ae aati (4) 
a3 a2 aj ao 
where 
11 11 li j ik 

1fji-i-ilt 1] 1-i j —-k 
P=-|% @ @ @ | and Pl=-— i ow ; 5 
2, i i -j-j 1 i —j-k - 


-k_k & l-i-j k 
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Proof It is easy to verify this 


ide Aye a Ls a. BAS 
ais tae Fa ee ea [hs es bee 
11 1-1 | 448 (@ 4.4") py 
eee a ae je ere | 


4ay 4ayi 4azj 4a3k 
4ayi 4ay 4a3k 4azj 
4arj 4a3k 4ao 4ai 
A4ay3k 4ayj 4a,i 4ag 


Also, since 
1 000 4ay 4ayi 4arj 4a3k 1000 
0.00 4ayi 4ay 4a3k 4arj 0700 
0070 4arj 4a3k 4ay 4ayi 0070 
0004) \4a3k 4arj 4ayi 4ay } \O00k 
4ag a4a, 4a, 04a; 
__ | 4a, 449 4a3 4a (7) 
~ | 4ay a4a3 4ay ada, 
4a; 4ay 4a, 4ao 
we get 


ag AA, A2 AAZ 
a, ao a3 a2 


. 17 2, 35) p-l 
P diag (a, Byes a) P ~ | ay a3 dy aay ve 
az az ay ag 
where 
ar ee kee mes 
Pyi-i i ii 1 Lh gh 
ase Vee a ae Naa 1i jel? © 
re —_— 


USFE over the elliptic quaternions clearly reveals three fundamental facts: 


(1) A is algebraically isomorphic to the matrix algebra 


ag Ada, a2 AA3 
a; do 43 4) 

Hi = : do, 41, 22,a,ER (10) 
: a2 a3 Ag AA, 


a3 a2 a, do 
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ag Ad, a2 AA3 
a, do 43 a2 


:H, > H', a)= 11 
gp: Ha ar 9a) apd an cu (11) 
a3 d2 a1 do 
(2) Every elliptic quaternion a € H, has a real matrix representation 
ag Ada, a2 AA3 
a; ag a3 a2 
a)= 12 
9 ( ) a2 AA3 do Ada, ( ) 
a3 a2 a, a 
in H). 
(3) All real matrices in H’ can uniformly be diagonalized over the elliptic quater- 
nions. 


It is nearby to identify a elliptic quaternion a € H, witha real vector a € R*+. We 
ao 


a\ 


will denote any elliptic quaternion a = ay + aii + aaj + a3k = . Then the 


a3 
multiplication of a and b = bj + bhi + b2j + b3k can be written by the aid of the 
ordinary matrix multiplication as 


dg Qa, ay aa bo 
| a a 43 a bh | 
ab =ba= Donan ony = ypla)b (13) 


a3 a2 a a bs 
where ¢ (a) is called fundamental matrix of a. 


Theorem 2.2 ({/8]). Fora, b € H,andX € R, the following identities are satisfied: 


lL a=bop(a=¢d), 

2. plat+b)=9(ay+(d), 

3. plab)=pla)pb), 

4. p(Yp@b="~Q@yd), 

5. p(va) =Ap(a), AER, 

6. (p(a))’ = ¢('4), 

7. trace(p(a)) =at+!at "a + 3a, 
8. |la|l* = [det (p @))). 
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3 Elliptic Quaternion Matrices 


The set of all m x n type matrices with elliptic quaternion entries is denoted by 
AR", A *<" with usual matrix summation and multiplication is a ring with unity. 
Elliptic quaternion matrix A = (a; i) € H”*" has three types of conjugate: 


1A ee (‘aj) € a 2A — (az) € 7 hg and 34 = (aij) € Hee, 


Also, elliptic quaternion matrix A = (ai ) € H’"*" can be written as A = Ao + 
Aji + Aoj + A3k where Ag, Aj, Az, A3 € R”*”. Then, "A= Ao — Aji + Aoj — 
A3k, 7A = Ao — Ati — Aoj + A3k and 7A = Ag + Ayi — Azj — A3k. A matrix 
A’ € H"*" is transpose of A € H”*", Also A* = (‘A)’ eA, ¢=—1,2,3, 18 
called conjugate transpose according to the sth conjugate of A « H/"*", [19]. 


Theorem 3.1 (//9]) Let A and B be elliptic quaternion matrices of appropriate 
sizes. Then followings are satisfied: 


(-4)"=(47), 
(AB)* = B* A*, 
(AB)! = BTA’, 


. *(AB) = (°A) (°B), 

If A~! and B™ exist then (AB)! = B-'A~!, 
. If Aq! exists (A*)~! = (A7!)™, 
(A) = (A). 


NAWAWNM 


4 Real Representations of Elliptic Quaternion Matrices 


For X € H?*" and A = Ag+ Aji + Aoj + A3k € H’”*”, we will define the linear 
transformations _ 
“ba (X) = A(X), 5 =1,2,3. (14) 


Then, we get real representations of elliptic quaternion matrix A € H/"*" 


Ag —@A, Ad —aA3 Ag aA, —A2 —aA3 

ii _ A, —Ag A3 —A2 A, Ag —A3 —A2 
Az —aA3 Ag —aA1 A2 aA3 —Ag —Q@A{ 

A3 —A2 A, —Ao A3 Az2 —A, —Ao 

Ao —aA| —A2 aA3 

36 _ A, —Ag —A3 Az 

a= A2 —aA3 —Ao aA, 

A3 —Az2 —A, Ao 


E RAmxdn | 2b4 = € Rimxdn | 


€ Ram x4n : 


(15) 
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Proposition 4.1 Let A, B ¢ Hi"*",C © H?*? and X€R. Then the following 
properties are valid: 


1 A, Be AR then’ da+p =*ba +* op fors = 1,2, 3. 
2 ACH™*, Be Hee then 


“bas = "ba ('Pn) bn ="da°doq) (PP), 5 = 1,2,3 


where 
I 0 0 0 .,0 0 O 
-I,0 0 Ol 0 O 
Ip _ t Dee t 
Nh, Oh Oal® Per oO, ON? 
0 0 O0-/; 00 0 -/ 
I 0 O 0 
O-I, 0 0 
3 —_ t 
FS NGG). a0 
00 OF 
3. IfA€ H”*" is a regular matrix then 
SBA SA Pn) Cand Py)ig Ol’ SS 1p 3s 
4. For A ¢ Hr" 
A= =m E4m (da) Ea 
A = 3235 Eam (76) Ea 
and ; 
A=~——E m ; EE; 
2 Iq Bam (C04) Een 
where Ey =(hih jl kl) € A. 
5. Ae Am 
P 
O;,| (164) Qn =—('ba), Ry! (bb) Rn = (!oa), St (14) Sn = — (14) 
O7,' 7a) On = Poa), Rn! Coa) Bn =—(?a), Sn! 7a) Sn = — (Poa) 
Q;,| (oa) On =. (oa) ’ R;,! (oa) Rn = (oa) , Sz! (oa) na (oa) 


Cen Coat Pr) ="O.5. For s= 12,3. 
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where 

Oa, 0 0 0070 
i 0 0 0 0001, 
2=lo 0 0a,|° *=l|r000]° 
007 0 01,00 

oO Oar 

saf0 01 @ 

‘1 O0al,0 0 

0 0 0 


Proof This proposition can also be easily proved by direct calculation. 


5 Solving the Sylvester s-Conjugate Elliptic Quaternion 
Matrix Equations 


Theorem 5.1 (Roth’s theorem [16]) Let A € R"*”", B € R'*" and C € R™*". 
Then real Slyvester equation AX — XB = C has some solution (perhaps nonunique) 


if and only if 
AC d AO 
0B oe 0B 


Now, we consider the solutions of the equations 


are similar. 


A(*X)-XB=C, s=1,2,3 (16) 
with the help of the real representations of elliptic matrices, where A € H’”*", Be 
AM" andC € H/"*". We define the real representations of the matrix equations (16) 
by 

“oaY —Y (dbs) =*¢c. (17) 


Proposition 5.2 The Eq.(16) has a solution X if and only if the Eq.(17) has a 
solution Y = *dx (Py). 


Proposition 5.3. The Eq. (17) has a solution Y if and only if the equation 


So x0) x0) 0 
(vo) ee (5%, 


are similar. 
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Theorem 5.4 The Eq.(17) for s = 1 has a solution Y € R*”"** if and only if the 
Eq.(16) for s = \ has a solution X € H"™". In that case, if Y € R*"**" is a solution 
of (17), then the matrix 


1 i 
X= 555 Bam (Y') Ean (18) 


is a solution of (16) where 


Ve ; (Y ("Pu) = On'¥ (Pn) On + RY (Pn) Rn = SY (Pn) Sn) (19) 
and 
Ey = (Lil, j,k.) ¢ HO“, t=m,n. 


Proof Suppose that the real matrix 


Yur Yio Ys Vig 
Yo1 Yoo Yo3 Yo4 
Y31 Y32 Y33 Y3q 
Y4, Y42 Y43 Yq4 


» Vy € R"™, u,v = 1,2,3,4 (20) 


is a solution to (17), then we show that the matrix given in (18) is a solution 
to (16). Since oO. (‘dx) On = (dx) ’ R;, (dx) Rn = (dx) ’ Sa (dx) Sn = 
_ (dx) and Y = !¢ x P, we have 


1a (—Qin'¥ (Pn) Qn) (Pn) — (— Qin ¥ (Pn) On) (P Px) ('O8) = (Gc) 5 
1a (Rn ¥ (Pa) Rn) (Pa) — (Rin! ¥ (" Pr) Bn) (Pa) (Ca) = ec). 
a (—Sm'¥ (Pa) Sn) (CPn) = (= Sin! ¥ (FPr) Sn) U Pn) (ba) = Ce). oi 


This equation shows thatif Y isa solution to (17), then (— Q7,'Y (' Pn) Qn) (' Pn). 
(R,,1Y (1 Px) Rn) (1Pn) and — (S;,1¥ (1 Pi) Sn) (1 Pn) are also solutions to (17). Thus 
the following real matrix: 

y’ = (Y ~ (-0;,'Y (CP, ) On ae R'Y (’ P,,) Rn _ sly (P;) Sn) (' P,)) 

(22) 
is a solution to (17). Now substituting (20) in (22) and the simplifying the expression, 
we easily get 


Ale 


Zo -—aZ| Zo —aZ3 
Z| —Zo 23 —Zo 
Zo —aZy Zo -—aZ, 
Z3 —Z, Z, —Zo 


‘ox = 
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where 
Z,=1(Y — 1(Yr Ya 
0 = 7 (Mi + Yoo + ¥33 + Yaa), Z, =f (24+ ¥%1+ + Yu), 
(23) 
Zp = 4 (Viz + You + Yai + Yao), Z3 = 5 (4+ Yo3 + 2 4+ Yui). 


a 
Thus we obtain 

I, 

x= : Cin tlm jl kim) Y' in 


2—2a dn 
kl, 


Theorem 5.5 The Eq.(17) for s = 2 has a solution Y € R*"**" if and only if the 
Eq.(16) for s = 2 has a solution X € H™*". In that case, if Y € R*”**" is a solution 
of (17), then the matrix 


1 


X= 53a E4m (Y') ER (24) 
is a solution of (16) where 
Y= ; (y CP.) +O; ¥ CP.) O.— RB, ¥ CP.) Ra — 5S, ¥ CPi) Sn) CS) 


Proof The proof of this theorem is similar to that of Theorem 5.4. 


Theorem 5.6 The Eq.(17) for s = 3 has a solution Y € R*"**" if and only if the 
Eq.(16) for s = 3 has a solution X € H™*". In that case, if Y € R*"**" is a solution 
of (17), then the matrix 


1 
X = 5 Esm (Y’) Ey, (26) 
is a solution of (16) where 
y’= ; (Y CP.) — O7'¥ C Pr) On — Ry Y C Pn) Rn + Sy'¥ CPx) Sn). (27) 


Proof The proof of this theorem is similar to that of Theorem 5.4. 
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6 Numerical Algorithms 


Based on the results in the previous section, in this section we supply numerical 


algorithms for solving elliptic quaternion matrix equation A (°X) —-XB=C, s= 
1, 2,3. 


1. Input s 

2. Input A, BandC (Ac H™™", Be H™*?, Ce H™™*") 
3. Form Q,;, R;, S;,°P;, t=m,n 

4. Ifs = 1 then 


Compute '@4, ‘dz, 'c 
er 1 
a Oc ga 0 a 
a Olé, and ee are similar then 
Compute Y’ according to (19) 
Calculate X according to (18) 


else 
There is no solution 


5. If s = 2 then 


Compute *d4, *dg, *¢c 


242 2 
If ( PA .) and PA a are similar then 
B 0 “op 


Compute Y’ according to (25) 
Calculate X according to (24) 


else 
There is no solution 


6. If s = 3 then 


Compute 3a, 3op, 36¢ 


34. 3 3 
If ( Pa ) and Pa 3 ? are similar then 
B O ~ds 


Compute Y’ according to (27) 
Calculate X according to (26) 


else 


There is no solution. 
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7 Numerical Examples 


Let us for solve the elliptic quaternion matrix equation 


li\ap_ y(10\_ (+s 247 +k 
GVO 8G)= Gas a) 


with the help of the Theorem 5.4, real representation of given matrix equation is 


10 0 -a00 0 O 10 0000 0 0 
00-a 0010 0 00 0000 0 0 
01-1 0000 0 00-1000 0 0 
10 0 0000 -!1 y-y 00 0000 0 0 
000 0100 -a 00001000 
010 0 00-a 0 00 0000 0 0 
000 001-1 0 00 0 000-10 
00 0 -110 0 0 00 0000 0 0 

0 l—-a2a 0O -a 1 0 -a 

—a 1 a -a 0O 0 a 0 

—2 0 O-l+a 0 1 a ~-l 
—{-l 1 a -—-l -1 0 0 0 
~t[-a 1 0 -a@ 01-a2a 0 

0 0 a 0 -a 1 a —-a 

0 1 a --l -2 0 O-l+a 

-1 0 0O 0 -! 1 a ~-tl 


If we solve this equation, we have 


O0laN0100 
000a0la0 
10010001 
01001001 
010001la0 
Olad000a 
00011001 
10010100 


Finally, we obtain 
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O1=e'0 04 0 © 
00 0 -a01l-a 0 
020° 100 0: <1 
_[010 0100 -1 
Ce Vijit- 0 Doane 6 
01-—a 0 00 0 -a 
00 0 -110 0 -1 
10 0 -101 0 O 


and 
Ol-a 0010 0O 10 
00 0 -a0dl-a 0 01 
10 0 -100 0 -1 i 0 
yo 10:;0j70k0 010 0100 -!1 Oi 
~ 22a\ 010107 0k 010 0 01-a 0 j 0 
O0l-a 0 00 0 -a Oj 
00 0 -110 0 -1 k0 
10 0 -101 0 0O Ok 
_filtiy 
= N Epp ee 
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Deterministic Actions on Stochastic ®) 
Ensembles of Particles Can Replicate siete 
Wavelike Behaviour of Quantum 

Mechanics: Does It Matter? 


Dale R. Hodgson and Vladimir V. Kisil 


Abstract The probabilistic interpretation of quantum mechanics has been a point of 
discussion since the earliest days of the theory. The development of quantum tech- 
nologies transfer these discussions from philosophical interest to practical impor- 
tance. We propose a synthesis of ideas appeared from the field’s founders to modern 
contextual approaches. The concept is illustrated by a simple numerical experiment 
imitating photon interference in two beam splitters. This example demonstrates that 
deterministic physical principles can replicate wavelike probabilistic effects when 
applied to stochastic ensembles of particles. We also reference other established 
experimental evidence of the same phenomena. Consequences for quantum infor- 
mation technologies are briefly discussed as well. 


1 Introduction 


The recent quantum technology roadmap [1] invites a wider community to discuss 
fundamental aspects of the emerging quantum technology. It is instructive to com- 
pare the present proposal with an earlier one [24], see also the discussion in [14]. 
Clearly, we now have a better understanding of the challenges confronting further 
development and a more cautious prediction of expected progress. Yet some more 
fundamental discussion may still be relevant. 

Superposition of quantum states is a crucial element of quantum information 
as highlighted in the abstract of [1], see also [4, 38]. However, the meaning of 
superposition depends on the interpretation of the wavefunction: 
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e whether it describes the nature of an individual quantum object, a qubit, say; or 
e it only exists as a representation of the properties of an ensemble of identical 
objects. 


An interpretation of the wavefunction was disputed since early years of quantum the- 
ory, which does not prevent it, however, to be used for accurate experimental predic- 
tions. Obviously, taking the wavefunction for granted is only justified for experiments 
with large ensembles of objects. The emergence of quantum technologies reliant on 
individual quantum system, e.g.. qubit, reignites the significance of the wavefunction 
question. In this notes we review some classical works on “wave mechanics” and 
revive their ideas with a new twist. 


2 Origins of Wave Mechanics in Relativity and Paradoxes 
of Measurement 


Nowadays, the fields of quantum mechanics and relativity are often treated as at-odds 
to each other, or as being at least technically incompatible. The nature of entangle- 
ment, EPR paradox and Bell’s theorem cause the main disagreements between the 
two theories. Recent experiments [20, 21] claim to fully demonstrate the instanta- 
neous action-at-a-distance effects of quantum mechanics, in violation of the local and 
continuous model of relativity. However, in the formative years of quantum mechan- 
ics, relativity was not seen as such an antagonist to quantum theory. Moreover, in 
the 1920s relativity was considered as a new fundamental law of nature which can 
be the ultimate source for any physical theory. 

In the celebrate paper [12] de Broglie introduced a wave-like nature of quantum 
objects considering special relativity laws, see also his later revision [13]. He ele- 
gantly combined quantum theory with relativity and lay the foundations for what 
would become Pilot-Wave Theory. Similarly, but often overlooked now, it is the call 
to relativistic principles which Schrédinger included in work on the uncertainty prin- 
ciple, see recent discussion in [25]. This confirms that the main mathematical tools 
of new quantum mechanics—the wave function and the Schrédinger equation—are 
not fundamentally in odds with relativity. 

The disagreements of the two theories (or rather their interpretations) was unfolded 
later by the famous EPR paradox [15] which relies on the wavefunction designated 
to describe an individual quantum system. Furthermore, ‘wavefunction collapse’ and 
“projection postulate’ have to be introduced to explain effect of repeated measurement 
of an individual system. It is not an exaggeration to say that all ‘paradoxes’ of 
quantum mechanics are rooted in attribution of the wavefunction to an individual 
quantum system. A good illustration is the textbook [19], where the authors managed 
to avoid all difficult questions of interpretation until the very last chapter, which 
introduces wave function collapse and quantum measurement. Attempts to ‘resolve’ 
their paradoxical consequences open the door to even more exotic theories of many- 
worlds or even many-minds types [2, 16]. 
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3 De Broglie-—Bohm Pilot Wave Theory and Contextuality 


The existing mathematical model of quantum theory gives a good description of 
physical observations, but does not attempt to explain any mechanism for such results. 
This omission is raised by John Bell as an argument that current quantum theory 
cannot be an ultimate one [6]. Other interpretations of quantum mechanics have 
been proposed, here we choose to focus on those of a contextual nature such as [18, 
27, 28]. 

The use of ‘realism’ in quantum mechanics has been a point of discussion since 
the days of Bohr and Einstein. Most crucially, the famous Einstein—Podolsky—Rosen 
paper [15] discussed the place of elements of physical reality in quantum theory. In 
light of formal ‘no-go’ theorems [5, 10], it is usually accepted that realism has no 
place in a quantum framework. Here, we are inclined to share the view of contex- 
tualism such as Khrennikovy, that realism can be included, provided we accept that 
any physically measured result is a product of both ones target object and the system 
with which it has interacted in order to show such result. Interestingly, the original 
Niels Bohr’s viewpoint may be more accurately reflected in contextuality rather than 
by the present dried-out flavour of the orthodox Copenhagen interpretation [28]. 

Indeed, it should seem natural to consider that any object which never interacts 
with any other system in the universe has, in effect, no part in reality, and that in any 
situation where we have learned some physical property of an object, we have done so 
only by the interaction of said object with some measuring apparatus. Furthermore, 
it should seem unnatural to take the view of the quantum mechanics orthodoxy, that a 
“measurement operation’ on a quantum system is some totally disparate hand of God 
which collapses a wavefunction to some result and then returns to the aether. The 
observation of an experimental outcome is always the result of interaction between 
object and apparatus, but the apparatus itself must also be a physical body and 
hence subject to some effect of the interaction. It is this final point, that apparatus 
is mutually affected by all its quantum interaction, which we shall espouse in our 
proposed model. 

Since the above assumptions are very plausible it is natural that they already 
appeared in the literature. Different models of such interaction of a quantum sys- 
tem and apparatus were independently considered in [26, 29]. A few years later 
deterministic objects with wave-like collective behaviour were physical realised as 
droplets [11]: these are ensembles of well-localised classical objects which interact 
with each other through waves spreading in the surrounding media. One can eas- 
ily interpret this framework as an explicit materialisation of de Broglie-Bohm pilot 
waves theory [8, 22, 23, 37], which were approached from a different theoretical 
setup in [36]. Closer to the quantum world, there is a recent confirmation of the 
concept performed on photons [17]. It is not surprising that ideas similar to [26, 29] 
were independently proposed in the later works, e.g.. [33]. 

In fact the previous discussion [29] can be linked to one of lesser noted work [7] on 
wavefunctions and we shall adopt key parts of its notation. In this work [7] Bohm and 
Bub proposed that the probabilistic results of quantum mechanical experiments could 
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be explained by assigning not just a single wavefunction |W) to a quantum objects, 
but also a second dual-space vector (&|, then with some simple deterministic rules, 
the outcome of a measurement depends on the states of both vectors together [7]. 

Our proposal is to decouple this second vector (&| from the quantum system and 
attach it to the measuring apparatus or the contextual environment in a general 
sense. In other words, we take Bohm&Bub’s dual-vector to represent the quantum 
state of the physical apparatus which necessarily interacts with a quantum object in 
order to output a measurement result. 

Clearly, this attribution does not change the mathematical theory used in [7] and 
preserves all desired logical consequences. On the other hand, the proposed merge of 
the vector (&| to apparatus will be completely in the spirit of the contextual interpre- 
tation of quantum mechanics and will restore the epistemological balance between 
a quantum system and apparatus during the measurement process. Furthermore, we 
eliminate a need for a carrier media (a sort of ether) for the pilot wave as well as the 
necessity to consider an ‘empty wave function’, which spreads in the vacuum and is 
not carrying energy or momentum [37]. Furthermore, the vector (&| being attached to 
a physical object—the measuring apparatus—removes a need for non-local theories. 


4 Illustration by a Numerical Experiment 


As a simple illustration of the proposed framework we produce the following numer- 
ical experiment for the textbook example of single-photon interference through two 
beam splitters [4], Sect. 2.4.1, cf. Fig. 1. In this common experiment, a single photon 
source emits quanta towards a 50/50 beam splitter (BS1), with its two output paths 
aligned to intersect with a second beam splitter (BS2). The arrangement is such that 
the path length of one route may be varied. The value e’® representing the phase 
change caused by an increased path length, and the two counters D1 and D2 simply 
detecting the final position of a photon after passing through the system. 

There are no claims that our numerical model represents the actual physical setup. 
Instead we pursue the following two more modest goals: 


Fig. 1 The standard two D2 
beam splitters arrangement a 
in Mach-Zehnder ie . 
interferometer e* 
A-h-B oo 
geee 


A 
=-a- 7 


ND) BS1 
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1. Present an explicit implementation of the proposed two wave functions contextual 
model. 

2. Demonstrate that the new model successfully replicates quantum behaviour 
through deterministic particles, similarly to early theoretical models [26, 29] 
and physical experiments with droplets [11] and photons [17]. 


In a sense, our numerical model is a contemporary variation of ‘thoughts experiment’ 
concept used for a long time. 

In a wave model, this experiment is simply described by the first beam splitters 
halving the intensity of incident waves, and the second beam splitter recombining 
the waves from both paths. When the path length of one route is altered, the waves 
arrive out of phase and we see wave-like interference of the light by different intensity 
registered by detectors D1 and D2. A particle model on the other hand, where we may 
introduce individual photons to such an apparatus, struggles to describe the observed 
interference effects without introducing a wave-particle duality to our model of light. 

We seek to describe a deterministic particle model showing interference effects 
without the introduction of any dual nature. We do this by considering the ‘internal 
periodic phenomenon’ which formed the beginnings of de Broglie’s work on waves 
and quanta [12, 13]. We model the internal periodic phenomenon as a complex phase 
with some given frequency v and initial value ¢o: 


exp (i(vt + ¢0)). 


where the wave model tells us that two waves meet and interact at the second beam 
splitter, we wish to describe each photon as having travelled a single definite path. 
Then we must allow for the interaction between phases of each particle. To allow for 
the interference between two particles emitted at different points in time, we note 
the physical context of the apparatus. In effect, the apparatus has its own internal 
periodic phenomenon. The interaction of both wave phenomena of the particle and 
the apparatus changes both by some deterministic rule. 

Effectively the wave phenomenon of the apparatus serves as some ‘memory’ of 
the phase of particles with which it interacted before. Then, subsequent particles 
passing the apparatus are affected by their predecessors. This is fully in line with 
our previous argument that any experimental apparatus is itself a quantum system, 
and hence must posses its own internal periodic phenomena. Other “toy models’ also 
exist supporting this variety of theory [29, 33]. Obviously, the interaction of particles 
with subsequent emissions clearly falls within the realms of locality, giving hope that 
such models may help to explain more adequately the effects typically attributed to 
instantaneous wavefunction collapse. 

We have a simple numerical simulation of such a model. It is based on the fol- 
lowing assumptions: 


e Photons are emitted from a coherent source at random time intervals. 

e The parts of the apparatus (beam splitters) with which the photons interact have 
their own phase and frequency, comparable to the (| vector of Bohm’s ‘double 
solution’. 
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0.75 


0.65 


% of reflections 


° 
a 
° 


0.55 


0 10 20 30 40 50 


Fig. 2 Results of the numerical experiment: there is sine-like dependence of percentage of reflec- 
tions from the optical paths difference 


e The corpuscular photons have an instantaneous interaction effect with beam split- 
ters. 

e At the instant of interaction, the beam splitter reflects (sending the photon down 
path 1) if and only if the phase difference between the particle and beam splitter 
is less than zr. Otherwise, the particle is transmitted (taking path 2). 

e Ifa reflection occurs, the phases take new values, proportional to the phases at the 
moment of interaction. If a particle passed through without reflections both phases 
are unchanged. 


In other words, the post-interaction phase change acts as the ‘apparatus memory’. 
Note that if the two paths available are of different lengths, and a particle has taken 
a path different to its predecessor, then there will be a difference between particles’ 
phases and cumulative ‘memory’ phase of the second beam splitter. 

The proposed realisation, see Appendix A, of this model demonstrates the fol- 
lowing features: 


1. The 50/50 distribution of reflection/transmission seen for a stream of photons 
interacting with a single beam splitter. 

2. A variation from 50/50 distribution for two beam splitters, the achieved maximal 
deviation is around 25/75. 

3. The final reflection rate varying periodically with the alternation of path length 
between paths | and 2. 


Indeed, a computer simulation of 10° particles shows a correct distribution in one 
beam splitter, and over 50 steps in optical path length difference succeeds in showing 
an interference effect, see Fig. 2. 
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5 Conclusion: Why Does It Matter? 


It is a widespread believe that one photon interference observed in Mach-Zehnder 
interferometer firmly attributes a wave function to an individual quantum object [4]. 
However, this interpretation requires the ‘wave function collapse’ to explain results 
of consecutive measurements. Furthermore, the wave function collapse introduces a 
variety of paradoxes [19] with action-at-a-distance as an example. The corresponding 
concept of quantum entanglement [19] is so fragile that its dialectic opposition— 
spontaneous and uncontrolled disentanglement—is often introduced. An attempt to 
employ these paradoxes for, say, quantum cryptography [1] puts such future tech- 
nologies on a potentially shaky basis [14]. 

The numerical simulation presented in this paper shows that one-photon interfer- 
ence is compatible with an interpretation of the wave function as a description of 
large ensembles of deterministic particles which self-interact throw the surrounding 
context. This picture is a blend of the de Broglie—Bohm pilot wave theory [8, 22, 
23, 37] with the contextual interpretation of quantum mechanics [18, 27, 28]. This 
viewpoint is also support by physical evidences [11, 17], however some new spe- 
cially designed experiments are required to test reality of the de Broglie—Bohm pilot 
wave and deterministic quantum paths [36]. 

Combing some classical ideas with recent developments, we attribute the second 
wave function from [7] to the measuring apparatus or more generally to the context 
of the quantum system. This does not alter the mathematical model of quantum 
mechanics and preserves all its predictions for an ensemble of quantum objects 
described by the traditional wave function. However, consequences for an individual 
quantum object are significant: they can have deterministic nature as was theoretically 
predicted [26, 29, 33] and experimentally observed [11, 17]. 

Possibly, the attribution of the second wave function to the context is rather a 
convenience and aesthetic choice than a necessity. Similarly, one cannot be forced 
to accept the Galilean model of the Solar system and may use sufficiently elaborated 
Ptolemaic epicycles for successful launches of satellites. Yet there are important prac- 
tical consequences from our discussion. Namely, statistical observations of quantum 
systems do not prove that an individual trapped ion is caring a superposition of 
pure states and thus can be used as an implementation of a qubit as it is commonly 
assumed [1, 24]. Instead, a realisation of a single quibit may requires a management 
of a large ensemble of quantum objects and sufficient control of their self-interaction 
through the context in order to avoid their disentanglement. Those two tasks are on 
a completely different technological scale from an operations on a single particular 
trapped ion [14]. 

It is a common perception now that the next big step in quantum information 
requires much more efficient noise control. Optimistically it is compared to tackling 
static electricity in the early days of lamp computers. However, it is not excluded 
that ‘it is only the noise’ problem for quantum computers can be on par with ‘it is 
only the friction‘ obstacle to create a perpetual movement. In other words, theoretical 
foundations of quantum information technology require further scrutiny and broad 
discussion [14, 30]. 


wn 


woe 
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As a final remark, our work shall not be confused with a search of probabilistic 
model of quantum mechanics, cf. [3, 9, 34]. We do not try to remove complex- 
valued amplitudes from the theory and replace them by real (but possibly negative) 
probabilities. The role of complex scalars in quantisation [31, 32, 35, 36] can be 
hardly overestimated. 


Appendix A Source code 


from math import * 

from cmath import * 

import matplotlib.pyplot as plt 
from random import * 


# Initialise the random number generator 
seed () 


HEHHHHEHHHHEEHEEHERHEEREREAREEHERHEEHEERARREEHE HEH 
# Functions and preliminaries 
HEHHHHEHHEHEEHERHERRERHEEEEHERHERREEHERRERREEEE EEE 


# Phases are normalised to the interval [0,2] 
# for simplicity. Variable phase for beam 

# splitter 1 (BS1), Xil, starts from 

# a random value. 

Xii = 2.0*random() 


# Variable phase for BS2, Xi2, starts from 
# a random value. 
Xi2 = 2.0*random() 


# Measure of how beamspliter atom is heavier 
# than photon. 
ratio = 20.0 


# Function describing the effect of a beam splitter. 
def BeamSplitter(Phase, Xi): 


# Difference between photon phase (Phase), and beam 
# splitter phase (Xi). 

Delta = (Phase-Xi) % 2 

reflection = ( abs(Delta-1.0)<.5) 

if (reflection): # Reflection condition. 


# Both phases are transformed by the interaction: 


# Beam splitter phase is changed. 
Xi = (XitDelta/(1.0+ratio)) % 2 
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# Photon phase is changed, with a .5 offset. 
Phase = (Phase+.5-Delta*ratio/(1.0+ratio)) % 2 


return Phase, Xi, reflection 


# Function to check the distribution in a single beam 
# splitter. 
def PhotonInOneBS (Phase): 


global Xii 
_, Xi1l, reflection=BeamSplitter(Phase, Xil) 


if (reflection): 
return 1 
else: 
return 0 


Function to simulate two consecutive beam splitters by 
running the BeamSplitter() procedure twice, introducing 
an optical difference if the (simulated) photon is 
reflected by the first beam splitter. 

def PhotonInTwoBS (Phase, OpticalDifference): 


# 
# 
# 
# 


global Xii 
global Xi2 


# First beam splitter. 
Phase, Xil, reflection = BeamSplitter(Phase, Xi1l) 


# Add the optical path difference for reflected photon. 
if (reflection): 
Phase = (Phaset+tOpticalDifference) % 2 


# The same rules are applied to the second beam 
# splitter but the outgoing phase is of no consequence. 
_, Xi2, reflection = BeamSplitter(Phase, Xi2) 


if (reflection): 
return 1 
else: 
return 0 


HEHHFHHHEHEEHEEHERHARHEEREEEEHEERERHERHEEREEREEH EEE 
# Simulated experiment 
HEHHHHEHHEHEEHERHERHARHEAEEHEEHERRERHEEREEREEE REESE 


# Number of photons in each experiment. 
N = 100000 


# Count of reflections. 
Ref=0.0 
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# Using PhotonInOneBS() to check the distribution of 

# single photons under these rules. Simulate a sequence 
# of N photons interacting with a single beam splitter: 
for iin range(N): 


# Count of photons leaving ‘reflection’ output 
# of beam splitter. 
Ref = Ref+PhotonInOneBS (2.0*random()) 


print ("Percent,jof,reflections,in,one,,beam,splitter", \ 
1.0*Ref/N) 


# Using PhotonInTwoBS() to simulate the two beam splitter 
# experiment for a range of optical path differences. 


# Subdivisions of the optical path difference. 
Steps = 50 


# Array of results. 
Gr=[] 


#Run through all possible optical path differences: 
for j in range(Steps): 


# Count of reflections. 
Ref=0.0 


# Randomly re-initialise phase for BSi Xil. 
Xil = 2.0*random() 


# Randomly re-initialise phase for BS2 Xi2. 
Xi2 = 2.0*random() 


# Simulate a sequence of N photons: 
for iin range(N): 


# Count of photons leaving given output of 

# second beam splitter. 

Ref = Ref+PhotonInTwoBS (2.0*random(), \ 
2.0*j/Steps) 


# Log reflection rate for given optical path\ 
# difference. 
Gr. append (Ref /N) 


print(j, Ref/N) 


# Plotting the results of two beam splitter simulation. 
plt.plot (Gr) 

plt.ylabel(’%,of reflections’) 

plt.show() 
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and H. Sulym 


Abstract The chapter presents analytical research of the joining process of the 
AA2519/AA1050/Ti6A14V laminate layers produced by the explosive method. The 
proposed model allows determining the influence of connection parameters on the 
shape and quality of the connection zone. The analytical research uses the theory 
of the elastodynamic character of materials deformation at the connection point due 
to local traction load. The presence of high pressure during joining was limited to 
the region where the plane surface moving with a constant subsonic velocity. The 
residual stress distribution has also been analytically described and analyzed. The 
results of analytical investigations were verified by the structure examination of the 
bond zone. 


I. Turchyn (2X) 
Kazimierz Wielki University, J. K. Khotkiewicha str., 30, 85064 Bydgoszcz, Poland 
e-mail: ihorturchyn@ gmail.com 


I. Pasternak 
Lutsk National Technical University, Lvivska Street 75, Lutsk 43018, Ukraine 
e-mail: iaroslav.m.pasternak @ gmail.com 


L. Sniezek - I. Szachogluchowicz - V. Hutsaylyuk (BX) 
Military University of Technology, Gen. S. Kaliskiego 2, 00908 Warsaw, Poland 
e-mail: volodymyr.hutsaylyuk @ wat.edu.pl 


L. Sniezek 
e-mail: lucjan.sniezek @ wat.edu.pl 


I. Szachogluchowicz 
e-mail: ireneusz.szachogluchowicz @wat.edu.pl 


H. Sulym 
Bialystok University of Technology, Wiejska 45c, 15351 Bialystok, Poland 
e-mail: h.sulym @pb.edu.pl 


© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 305 
s. HoSkova-Mayerova et al. (eds.), Algorithms as a Basis of Modern 

Applied Mathematics, Studies in Fuzziness and Soft Computing 404, 
https://doi.org/10.1007/978-3-030-61334-1_16 


306 I. Turchyn et al. 


1 Introduction 


The development and progress of many industries are directly related to new tech- 
nologies, in particular, new materials, which possess high technical properties and 
show good performance along with easy processing, production and low cost. Struc- 
tural composite materials and technologies for their processing and production are 
among most prospective and science intensive ones, which concern current studies of 
materials science and have high potential for their future applications in engineering 
practice [1]. 

Due to its properties the explosion welding is one of most efficient, and in some 
cases the only possible way of production of bimaterial and multilayered composite 
materials [2-4]. The explosion welding is a complex physical phenomenon, which 
is related to several fundamental branches of material science, mechanics, molecular 
and gas dynamics etc. 

The high-speed collision of the bodies during the explosion welding is accom- 
panied with several important effects: the phenomenon of wave initiation, Munroe 
effect and surface cohesion [5]. The problem is related to different sides of metal- 
physics, but in the first place it is the mechanics one. During the cohesion of the 
materials in a solid phase, which is observed in the high-speed oblique collision of 
two bodies, at the contact zone the high-intensity narrowly-localized plastic strains 
are induced mainly in the wave-shape, which are observed at the bimaterial interface 
[6, 7]. Such cohesion of solids occurs without melting of metals, diffusion processes, 
and the obtained composite materials possess high strength, even in the welding of 
commonly incompatible metals [5]. 

As already mentioned, a characteristic property of the welding under high-speed 
oblique collision is a wave-shaped surface at the contact zone. There are many 
works, which explains this phenomenon utilizing the hydrodynamics analogy [8- 
12] molecular dynamics approaches [13] etc. Nevertheless, this chapter provides the 
explanation of this phenomenon based on the viewpoint of elastodynamics [14] of the 
specimen, which operates as a substrate, neglecting the temperature and sub-surface 
effect, related with the shaped charge formation. 


2 Problem Statement 


The explosion welding stands for the phenomenon of good bondage of surfaces of 
metal bodies, which collide at a certain angle, wherein at least one of them accelerates 
to the speed of about 1800...3000 m/s with the products of detonation of explosive 
charge [15]. Herewith, one should note that the explosion itself, or rather the energy 
of detonation products expansion in the abovementioned process plays the second 
role providing the accelerated relative movement of the bodies and their consequent 
collision. The physical nature of the sources of such acceleration can be different, in 
instance, electromagnetic field (during the magnetic impulse welding), the energy of 
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Fig. 1 The sketch of the 
: Quy , d2uy Vp P(A) 
problem af oe Ta (vvvv\ 
a? y 0 
gat te) 3s 


powder charge in the barrel of a weapon, the energy of explosion of electric conductor 
under high current etc. But in all cases the processes during the high-speed collision 
of solids are the same [5]. 

Since this paper considers only the mechanical processes acting in the bottom 
plate (substrate), the contact interaction of the latter with a specimen is modeled 
with locally applied tractions p(x, +t), which move with the subsonic velocity at 
the substrate surface, and the substrate itself is modeled with an elastic half-plane 
(Fig. 1). Thus, the problem is reduced to determination of the displacement field in 
the half-plane, and in particular, the deformed shape of its surface y = O ahead of 
the loading source and beneath it. 

In the mathematical sense, the problem is reduced to determination of the elas- 
todynamics solution for the volume expansion function @(x, y) and normal compo- 
nent uy(x, y) of the elastic displacement vector from the following system of partial 
differential equations [16]: 


ve 06 12076 


= ; 1 
ax? 7 dy? sc? ar? ) 

A(x, y, 0) = A(x, y,0) =0; uy(x, y, 0) = ty(x, y, 0) = 0; (2) 
Ree a 332) = kn uy(x, y,t) = 0; (3) 

Oyy = —py(x, t), Oxy = Px, t), y = 0, (4) 


where cy = /(A + 2y2)/p and cp = .//p are the longitudinal and transversal 
waves propagation velocities in the substrate material; 4 and ju are Lame’s elastic 
constants; K = c,/C2; py (x, t) and p,(x, ft) are the components of traction vector at 
the surface y = 0, which are defined as, 


px(x, t) = |p(x, 1] sing); py, t) = |p, 1)| cos(g), (5) 


and ¢ is an angle, at which upper and lower specimens collide during the explosion 
welding [17]. It should be mentioned that for successful bondage of two metals 
with explosion welding there exists certain dependence between the angle g and 
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detonation velocity Vp, which is experimentally determined for different pairs of 
metals that are welded. Principal mechanical effects, which determine the quality of 
welding, occur directly in the loading zone, thus for their accurate definition consider 
new coordinate system, which moves together with the loading 

Xx) =x-—Vpt, y=y. (6) 


In the moving reference frame (6) governing Eq. (1) are expressed as follows: 


6 86 1070 2p 6 VA 06 


+ = - i 7 
dx, Oy? ch 0 ce? Oxer ~ cp Ox o 
Oy... Oy Il Guy . 2Vp 07u, v, a7 u, Ge 00 
a [ = 2 4 2 ; 2 a8 ! »  @) 
ax: dy? sc OO? ti omer 27 0; dy 


Consider only stationary case [11] for which one can neglect local time derivatives 
of the functions sought. Thus, in the dimensionless variables aw = x;/L, y = y/L, 
where L is a characteristic dimension (for example, the length of the segment at 
which the loading is applied at certain time), the boundary value problem is reduced 
to 


2, 2. 2. 2 2 
(1 m7)28 4 8 _ 9, (1 mo + = (1 se (9) 


da2 ay? daz ay? ay 
lim @(@,vy)= lim v(a, y) =0; (10) 
y> +00 y> +00 
Oyy = —Py(@), Say = Pala), y =9, (11) 


where v(x, Be = uy(x, y)/L is a dimensionless component of displacement vector 
and M; = — are Mach numbers. It should be noted, that the sign of terms (1 — M ? 
determines the type of Eqs. (9), and thus, their solution strategy. This study considers 


all of these three cases. 
3. Derivation of the Problem’s Solution 
3.1 Case 1: Subsonic Detonation Velocity (M? < M3 < 1) 


Integral transform [18] for the variable a can be applied to the solution of the problem 
(9)-(11). Consequently one obtains the boundary value problem for a system of 
ordinary differential equations 
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a*o nz_o. 2 2\5 _ ci) 40, 

Ta ~ #1 — MB = 0: (= Maye = ( 3) (12) 
ee) = Pec re. = 0; (13) 
o ¥ = P,(&), Cay = Pal’), Y = 0. (14) 


Here OG, y) = ie | eles) Jed are the Fourier transforms [18]. 
v(é,y) v(@, y) 
The solution of Eqs. (12), which satisfies Sommerfeld radiation conditions (13), 


is as follows. 


Bi 


ee Bilély | 15 
EI (§)e~ (15) 


O(E, vy) = Al(EJe PEI’, FE, y) = Bele BEI” 4 


where B =1- Mj. 
For determination of the component u(a, y) of the displacement vector Foe 
integral ae is applied to the equation for volume expansion 6 = ue +2 s~ as 


6 = —iéu +2 . Herewith, accounting for the solutions (13), one obtains 
w= 2(8-) = 2 (awe + pie Bee"). 16) 
is ay E\M 


The Fourier transform of the stress tensor components are determined through the 
Hooke’s law with Eqs. (15) and (16) as 


3 c. 2 
OWE (3 _ 2p 4 5 OV =e ! + Pa A(é)e Pilly = 2fo|E|B(E)e BEI”: (17) 
ia oy M 


C5 1 
oe = —i§V = —i sen) 5 Page ARI =e (By Beye er, 


(18) 


where sgn(é) = ei - 0° 


Accounting for the boundary conditions (14), one obtains the system of linear 


algebraic equations 


ye, 


fe es A(E) + 2B)|€|B(E) = (19) 


sen(é) Bi AG) + &(B5 + DBE) = = je j Pal), 
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which solution results in the following expressions for the unknowns A(&) and B(é): 


ae p2)2® — 2i Bo sgn(&) 2a 2. 


A(é) = i | 
(1+ 63)” — 4B: B> : 
1 + B2) 2x) _ 9, Pr® 
i (20) 


\g\[ (1 + 63)” — 481B>| 


As aresult, the Fourier transforms of components of the displacement vector with 
the account of Eq. (5) take the form 


= D(é)| , (1+ B3) cosy — 2ifosgné sin g e~Filv 
vg, y) = Bi n2 
(1+ 63) — 461 ho lé 
i sgn(&)(1 + 3) sing — 2B) cos | 
(1 + 63)" — 41> | 


_ ip) ¢ + B3) cos y — 2iBosgn(E) sing e~PilFly 
c (1+ 63)” — 46h g 
i sen(é)(1 + 3) sing — 2B; cos — 
(1 + 63)’ — 4B Bo gE 


; (21) 


ug, y) 


+ Bo 


(22) 


For obtaining the inverses of expressions (21), (22) one can utilize the convolution 


theorem for the Fourier transform and the fact that the inverses of p(&), “— and 
ens 


lg] 


jen sll 
£ 


for ¢ => 0 are, respectively, 


p(a@), varete( 2), (c + In( a? + ?)). (23) 


It is known that vertical displacement in the elasticity problem for a half-plane 
are determined up to a constant. Therefore, one can assume that C = 0 in Eq. (23). 
Moreover, one can assume, that the loading distribution at the surface of the half-plane 
fits the Hertz contact distribution 


oe [eve =e), Jal <1; 


0, |e| > 1, 


where p* is a pressure at the center of a loading segment. 
Consequently based on Eqs. (21), (22) one obtains 
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1 
ss (1+ 63) = 2p si 
u(a, y) “a i ues [vo P)arere( "an a x 
u| (1+ 63) 46162 -1 
1 
x J V= 28) n(6P 7? + = m2) an 
-1 
1 
(1 + B2)Bo si 
mame f J 1?) in( (637? + (a — 9? dy 
- *hifaeose fi (1— 7?) Jaretg( yan; 


1 

* 1 2 

v(@, y) Bi oo | vo r) n( [872+ (@— 02) dn— 
u|(1-+ 62) - 4616 =1 


1 
ee m)arcre( "Ja 
) 1 
r pene J V0) in (880? + = 0? Jane 
-1 
1 
(1+ 63) si = 
mee | Jase( "én, (24) 
“1 


At the surface y = 0 the components of the displacement vector are as follows 


1 
* 1+ 62-2, ‘ 
u(a, 0) oe ( a! ae [ve n?)sgn(a n)dn + 
=] 


(1+ 63) - 48162 
1 
M2 Bo si 
mee la 1?) In| nn} 
“1 


1 

* My 

v(t, 0) ae (4 Bee f t= 2) nia — nln 
(1+ 63) - 4616 2 


1 
(1+ p3 - eee {ve / 2)sen(a al (25) 
-1 


It can be shown [17] that the relation (1 + p32)” — 4B, 62 = 0 is equivalent to 
the known equation, which defines the Rayleigh wave velocity; therefore, when the 
detonation velocity tends to the Rayleigh wave velocity in the substrate material the 
resonance phenomena are observed. 

In the same manner one can determine other nonzero components of stress tensor. 
First their transforms 
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(1 + B3) cos g — 2iBosgn(€) sin p o-bilély 
(1 + p3)° — 4B Bo 
i2Posen(E)(1 + 65) sin p — 2B) B2 cos p aa 
(1 + p3)” — 4B; Bo 
isgn(€)2B1 (1 + Bz) cos y — 461 fp sing en Pilély 
(1 + B3)° — 4B: Bo 
(1 + 63)? sing + i2sgn(€)Bi(1 + 63) cosy a 
(1 + 63)” — 4618. 


Oy, Y) =r 


Cay (&, yY) =— pe|= 


+ 


2, (1 +63) cosy — 2iprsen(E) sine pyety , 
(1 + 63)” — 4B Bo 

i sen()(1 + 63) sing — 2f1 cos g eo 
(1+ 63)’ — 4B.B: 


Tie ey = pe|a — By + 26?) 


(26) 


And then according to the convolution theorem, accounting for the inverses of 
equations p(é), i sgn(é)eFi”'5! and e~%7!§|, which are respectively 


p*/ (1-7), lal <1 1 a 1 Biy 
pa) = 24 Bey2)’ 24 Brv2 
0; Je 15 m (a? + B?y?)’ (a? + By?) 


one finally obtains 
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4616 Bi 
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The same as in the case of displacements, when the detonation speed tends to the 
Rayleigh wave velocity all of the stress components infinitely increase. 
The limiting case for the detonation velocity Vp = c2 (M2 = 1; M, = fo= 


0; 6, =,/ tik, kK=1- ae) is not a critical one. The solution of the problem 
exists and is as follows [19]: 


ge 2D dy; 


1 
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* aa 5 
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The numerical analysis is held for a substrate made of aluminium alloy, which 
has the following elastic properties [4]: A = 53.1 GPa, 4 = 26.5 GPa; therefore, 
c; = 6.3-10° m/s and c. = 3.2- 10° m/s, respectively. The Rayleigh wave velocity 
in this medium equals cr = 2.98 - 10° m/s. 
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As it was mentioned above, the explosion welding occurs only at certain ratios 
between the detonation velocity and the angle, at which the oblique collision of the 
specimens occurs. At the calculations the detonation velocity Vp (m/s) is selected in 
the range [2000; 2900]; the collision angle is in the range g € [0.24; 0.35] (radian); 
and p*/j = 0.01. Figure 2 presents the results of calculation of normal displacement 
v(a, 0) at the boundary of the half-plane for p*/u = 0.01, g = 0.3 and different 
values of the detonation velocity Vp, and Fig. 3 depicts normal displacement of the 
boundary for p*/y = 0.01, Vp = 2800 m/s and different values of collision angle 
g. One can see in these figures that the dominant influence on the displacement of 
the half-plane’s boundary under the constant loading magnitude p* is caused by the 
detonation velocity, especially those in the range, which tends to the Rayleigh wave 
velocity in the substrate material. 

The influence of ratio p*/jz on the separate components of the displacement 
vector is reduced to the scale factor. However, this ratio has significant effect on the 


Fig. 2 Normal displacement 0.08 5 
of the half-plane boundary v(1,0) 
for different values of the 
detonation velocity 
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Fig. 3_ Normal displacement 0.04 
of the half-plane boundary 
for different values of 
collision angle 
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Fig. 4 The shape of the 
half-plane’s surface for 
different values of loading 
magnitude 


shape of the half-plane’s surface, which is shown in Fig. 4 for Vp = 2800 m/s and 
gy = 0.3. Here it should be noted that for linearity conservation of the problem this 
ratio should be much less than one, however, in a crude approximation the results 
presented can predict the typical wave-structure of big scale at the surface of the 
substrate. 

Herewith it should be accounted for the fact that the colliding specimen just before 
its contact with the substrate due to the explosion welding is under high-magnitude 
alternating bending loading. The latter can initiate plastic bands (zones of softened 
material) in this specimen, which will “fill” the wave-shaped substrate after the 
collision. This process will be accompanied and supported with high pressure of 
gases at the zone just beneath the contact area. 

Figures 5, 6, and 7 presents the results of calculation of stress state of a half- 
space. All stresses vanish at infinity. Normal and transverse stresses are close in their 
qualitative distribution and possess extrema in the loading zone. 


3.2. Case 2: Transonic Detonation Velocity (Mm? <1< M3) 


In this case the type of the second equation in (9) changes, and thus, this also affects 
the transformed equations in (12). The latter can be written as 


lors 2 2D 
oy + (Mz — 1) = ( -4)*. (28) 


For obtaining general solution of this equation the radiation condition is as follows 


© + igpsv =0(—), po = (M2 -1 (29) 
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bas 


Fig. 5 Distribution of normal stresses 0, (a, y)/p* under subsonic detonation velocity 
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Fig. 6 Distribution of shear stresses oy, (a, y)/p* under subsonic detonation velocity 


This condition is satisfied by the general solution of Eq. (28) 


v(E, y) = Be )e Ps” (30) 
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Fig. 7 Distribution of transverse stresses da (a@, y)/p* under subsonic detonation velocity 


and finally 
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Similar to Eq. (16) one can determine u(&, y) 


if~ ov if l . 
7 = _ —Bilé| ; —ipoé 
“= z G =) - (aA Y + iBoé B(E)e ’) (32) 


and the Fourier transforms of stress tensor components are, respectively, 


a ‘ Ov 1+ By 
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(33) 


As it was shown above, the inclination angle of external loading only slightly 
affects the results. Therefore, for convenience and simplification of the resulting 
equations, consider that p,(@) = 0 in boundary conditions (11). Then according to 
these conditions one can obtain the following system of equations 


SH A®) — 2ifré BE) = 2; 


a 2 (34) 
sen(§) Fz Bi AE) — §(B, — 1) BE) = 0, 
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which solution is 


1 — B?)p M2 2 p 1 

dees ( Bi)Py) BO alt, 
(1 — 63)" — 4iB, Bosgn(&) FH (1 — 63)" — 4iB, Bosgn(&) 5 
(35) 


Based on the known A(é) and B(&) one can determine the transforms of displace- 
ments and stresses. At the same time getting rid of the complex expression in the 
denominator one finally obtains 


: Py (é)i [ (C — 62)? + 4iB1 Bosen(e)) (1 — Be P16” — 126 Bye "P28 sente)) (36) 

ae we M® —4(1 —«2)MP +22 +«)(1—«)2M? — (1 —«2)(1—«)2 
- Py(é)p1 | (C- 2? + 4iB1 Bosent)) (1 — BZ) Alb” — 2e~iba67) (37) 

ee MEL | MS — 411 — e2)Mt + 202+ «)(1 — «)2M? — (1 —«2)(1 — «2 
Taw(é, vy) = Py (§)( B3) 
we 
(1 = 63) + 418; Bosenlé)) (267 + MZ) — Bz)e~AIIEIY — ai, Bye 1P287 son(e)) 

S 6 q 2 (38) 

Mp — 4(. — «?)My +2(2+ «(UL — «)?MF - (1— «(1 «)? 

(1 = 83)? + 4if 1 Bosgn(é)) (e~P1l6l” — e287) sence) 

Eee errr 39 
Fae ee | a 4 — «2)M4 +22 +0) — «2M? — «20 — 02 ou 


Tyy (EY) 
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In Eqs. (36)-(40), the same as above k = | — 22, 
According to the convolution theorem, accounting for the inverses of expressions 


cme ao e~'5Y and ie—'5’ sen(&), which are, respectively, 


1 1 1 
oe] , 6 i 41 
5 Sen(a + 7) = nla + y| Cr ea (41) 


where 6(x) is the Dirac delta function, one can obtain the inverses of Eqs. (36)-(40) 


u(a, y) = wail 1 p3)° fi n? arctan — a ain + 461 Bol — B3)°x 
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where f(M,) = M¢ —4(1—«?)M}#42(24+«)(1 
At the surface of the half-space displacements are evaluated to 
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(48) 
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It should be noted that for the following range of M; : - < M, < 1, equation 
Ff (¢) = 0 does not have real roots, therefore, the solution of the problem in transonic 
range exists and is finite for arbitrary detonation velocity of this range. In the limit case 
of Vp = cp the solution converges to corresponding limiting solution for subsonic 
detonation velocity. In the other limit case of Vp = a and respectively M, = 
om Mi= 1h pe = a —1; B =0; f(M) = ($ = 2) , Eqs. (42)-(46) result in 


one-dimensional stress—strain state 


1 
"= «) 
u(a, y) = oa / VT= rsgn(a — n)dn; v= 0; (49) 
=i 
(a) 
Coa = —— Cay = 0; Oyy = —p(a). (50) 


In derivation of Eqs. (49) and (50) the folliwing limit was used [19]: 


= 75), (51) 


im 
y>t0 x2 + y? 


Figure 8 presents calculation results for the shape of half-plane surface, which 
physical and mechanical properties are the same as in the case of subsonic loading 
speed, and the detonation velocity is equal to Vp = 4000 m/s. One can see in Fig. 8 
that the same as in the previous case, the ratio p*/, influences both qualitative and 
quantitative characteristics of the shape of half-plane surface. Moreover, in the case 
of transonic detonation velocity it is also visible the characteristic “wavy” shape of 
the surface of the base plate. 


Fig. 8 The shape of the 0.4 7 
half-plane surface under Pv 
transonic detonation velocity 


2 0 2 atup iL 4 


Modeling of Selected Physical Phenomena Under Explosion Welding ... 321 


Corresponding results for calculated stress state of the half-plane are presented in 
Figs. 9, 10 and 11. In contrast to the previous case, stress state is not concentrated 
near the loading zone, but goes deep inside the half-plane. 
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Fig. 10 Distribution of transverse stress Oya (a, y)/p* under transonic detonation velocity 
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Fig. 11 Distribution of shear stresses og, (a, y)/p* under transonic detonation velocity 


Fig. 12 Fronts of Mach 
waves in the half-plane 


3.3. Case 3: Supersonic Detonation Velocity 1 < My, < M2 


In this case both equations for O(a, y) and v(a,y) are of hyperbolic type. 
Transformed equation for O(a, y) is as follows 


“* +£°(M} —1)8 =0. (52) 
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and its solution, which satisfies Sommerfeld radiation condition 
dO s 1 5 
— +i&B\6 =o0|—-}, Bi=VM;-1 (53) 
dy Y 

writes as 


O(E, y) = A(é)e Pv (54) 


Equation for v(é, y) in this case is as follows 


>= 2 
—, + &*(Mj - 1)v= (1 <r )ipieACe re”, (55) 
Y 1) 


and its solution, which satisfies Sommerfeld radiation condition (29) can be obtained 
in the form 


iBi 


VE, vy) = Bé)e” + — EM erie. (56) 


Next, similar to previous cases, one can determine u(&, y) 
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and respectively Fourier transforms of stress tensor components 
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According to the boundary conditions for py (a) = 0 one obtains the following 
system of equations 


|e EE A(E) + 2ifré B(E) = ee, 


2H A(e) + i&(B} — 1) BE) = 20, 


which solution can be derived as 
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Transforms of displacements and stresses are obtained from Eqs. (56)—(59) as 
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At the surface of the half-space. 


1 
p*[C. — Bz) + 26162] | 
,0) = vil 7 —n)dn; 
u(a ) 2u((1 — Bz)? + 4B; Ba) J UT sgn(a n) U] 


1 


1 
p*Bi(2 — B3) i 
,0) = vl—7 —n)dn. 66 
v(qa, 0) Dil — BD? + 4BiPr) | n-sgn(a — n)dn (66) 


It should be noted, that integrals incorporated in these solutions can be evaluated 
analytically as 


Bo 5, a+fiy>1 
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In the limit case of Vp = c, this results the same equations as those (Eqs. 49 and 
50) obtained in transonic case. 

The analysis of Eqs. (65)-(67) shows the presence in the half-space of four 
characteristic zones between the fronts of Mach waves a + 6;y = +1 (Fig. 12). 

Displacements are constant in zones | and 5. Zone 2 possesses stresses and strains 
caused by P-wave, zone 3—by P-wave and SV-wave, and zone 4—by only SV-wave. 

According to the causality principle, displacements and stresses in zone | should 
be zero. Therefore, the displacement field is defined as 
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Figure 13 presents the results of calculation of the half-space surface shape under 
supersonic loading speed of Vp = 7000 m/s. The same as in the previous cases the 
ratio p*/ has significant influence on the shape of the surface y = 0. At the same 
time, for all of the considered ratios of detonation velocity to elastic wavespeed in 
the half-space material, specific wavy shape is observed on its surface. The results 
for calculated stress state of the half-plane are presented in Figs. 14, 15, and 16. The 
same as in the prevoius case, the stress state extends deeply insiede the half-plane. 
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Fig. 13 The shape of the half-space surface under supersonic detonation velocity 


YA = 
NN) N) ANN aX aX 
YN BSNHIKHER Ni 
0 LK fe Xa) Wy By Ny () {} {} {} {} Wis 
ED aan a 
Os ee a eee es al a) y) AN i 


Fig. 14 Distribution of normal stresses 0, (a, y)/p* under supersonic detonation velocity 
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Fig. 15 Distribution of transverse stresses O(a, y)/p* under supersonic detonation velocity 
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Fig. 16 Distribution of shear stresses oy, (a, y)/p* under supersonic detonation velocity 


4 Finite Element Simulation of Elastic—Plastic 
Deformations Under Explosion Welding 


4.1 Finite Element Simulation of Clad Plate Deformation 
Under Explosion Welding 


Consider the deformation of the clad plate under moving explosion loading, which 
for simplicity we replace with a moving rigid punch. Assume that the plate rests on a 
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Fig. 17 Development of elastic waves in the clad plate ahead of loading front 


rigid base. The problem is modeled with a finite element method. Another approaches 
into modeling possibilities can be found in e.g. [14, 20]. 

In solving this problem for elastic—plastic material significant influence is caused 
by the values of detonation velocity and yield stress. In particular, under low detona- 
tion velocities or at high yield stress of the plate material it is observed the develop- 
ment of elastic waves spreading through the clad plate, which change its geometric 
configuration, thus, there may be poorly welded areas. It can also cause the develop- 
ment of nonuniform distribution of undetonated charge on the surface of the plate. 
Such situation of development of elastic waves is presented in Fig. 17. 

In the case reported in Fig. 17 the basic loading predetermines high stress concen- 
tration on a plate site directly under loading. Thus in the top and bottom parts of the 
inclined site of a plate plastic hinges are developed, which supports its bending to the 
substrate. However, the part which yet has not been affected by the action of moving 
loading starts vibrating practically as a membrane. It obviously predetermines insta- 
bility of this welding mode, since distribution of elastic waves can change distribution 
of a charge on plate surfaces. 

Figure 18 shows deformation of the clad plate with higher plasticity and detonation 
velocity than in previous case. It is visible that the character of plate deformation is 
different. Its deformed parts possess lower stresses at its midpoint than at its ends, 
where plastic hinges are developed. Also plastic hinge is developed at the straight 
part of the plate, which is in the detonation zone. Wave effects are not observed in 
this part of the plate due to high plasticity and high detonation velocity. 

Corresponding character of deformation of the clad plate is similar to a double 
bending on plastic hinges. Thus, one can develop the following simplified analytical 
model of deformation of the clad plate under explosion welding. 
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Fig. 18 Deformation of the clad plate under moving loading 


5 Simplified Analytical Model of Deformation of the Clad 
Plate 


Consider that under the influence of explosion impulse on the inclined part 1 of the 
clad plate plastic hinges 1 and 2 develop at its ends. Also due to bending caused 
explosion pressure, plastic hinge 3 develops at the horizontal part 3 of the clad plate. 
The length of zone 2 is selected same as the length of zone 1, since the plate inclination 
angle under explosion welding is constant, which is in agreement with experimental 
data. 

Accounting for inertia forces and short times of explosion impulses one can 
assume that Sect. 2 practically does not deform. At the same time the end 2 of 
the Sect. 1 of the clad plate should take the path, which is defined by the arc centered 
at the point 3 and a radius / (dashed curve in Fig. 19). It is obvious that in this case 
the Sect. 1 of the plate should serially undergo tensile and longitudinal compression 


Fig. 19 The scheme of the / plastic hinge 
“elementary chain” model of . 

deformation of the clad plate plastic 
hinge 
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deformations. Thus, considering the action of explosive impulse, it is assumed that 
this section does not lose its firmness in a longitudinal direction. 

Depending on a polar angle 6 these deformations are defined by an obvious 
geometrical relation: 


Al 
1 


= 1—cosé@ —cos(B — 6) +4 /2cos 0 cos(B — 6)(1 + cos B) — cos B(cos B — 2). (68) 


It is obvious, that the maximum value of strain (68) is obtained in the case of a 
polar angle O6max = 6/2. Substituting this value in Eq. (68) one obtains 


a) 
=2(1—cos=). (69) 
O=Omax ( - 2 


Thus, on the basis of Eq. (69) one obtains the following estimation for a collision 
angle p: 


Al 


dmax = 
1 


Omax 
p= 2arecos( ay ) (70) 


As aminimum limit value of strain bax it is assumed the deformations, which are 
reached at yield stress. As a maximum limit value dmax one can use experimentally 
obtained values of lengthening of the sample 5g at dynamic loadings. Thus, one can 
write the following range for estimation of the collision angle f of solids at explosion 
welding: 


6, 
2arccos( I _ sr) <B< 2arecos( _ *). (71) 


In particular, for aluminium AA1050A H8 oy = 140 MPa, 5g = 659 = 0.05, FE = 
68.3 GPa [21], so, one can obtain the following estimate of collision angle 


5.2° < B <25.7°, (72) 


That is well agreed with experimental data. 
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6 The Analysis of Deformation of the Clad Plate Falling 
on a Wavy Surface 


The abovementioned analytic analysis results in wavy shape of the base plate under 
moving explosion loading. Therefore, numerical analysis was held, which presents 
elastic-plastic deformations of the clad plate under moving explosion loading and 
wavy shape of the base plate. The following figures presents the time slices of Mises 
stress in the clad plate. 

One can see in Fig. 20 that the clad plate can fill in the gaps of wavy-shpaed 
surface only due to plastic deformations. After the shock wave paths, stresses quickly 
relax, however, the shape of the obtained surface remains mainly the same. Thus, 
the behavior of the clad plate is similar to the behavior of a liquid, which resembles 
experimental results. 


7 Conclusions 


Based on the relations of elastodynamics this paper models the typical wave initiation 
and wave-shaped structure of the substrate surface due to the explosion welding. 
Herewith, the substrate is modeled as a half-plane, and the contact interaction of 
specimens is reduced to the local traction loading, which moves with a constant 
subsonic velocity at the surface of the half-plane. The solution of this problem in 
the moving reference frame is obtained based on the Fourier integral transform for 
three cases of ratio of detonation velocity to elastic wave speed in the half-plane 
material. Based on the numerical studies held for an aluminium alloy half-plane it 
is proved that when the detonation velocity tends to the Rayleigh wave velocity in 
the substrate material the resonance phenomena are observed. At certain collision 
angles of specimens, detonation velocities, and especially magnitudes of loading, 
the surface of the half-plane in subsonic, transonic and subsonic cases acquires the 
wave-shape typical for the explosion welding. Regardless the number of assumptions 
and approximations made in the modeling of explosion welding process, the results 
obtained explains experimentally observed wave-shape structure of the interface 
within the framework of elastodynamics. 
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Fig. 20 Dynamics deformation of the clad plate welding with a wavy surface 
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Fig. 20 (continued) 
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Fig. 20 (continued) 
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Modelling of the Stress—Strain State ®) 
of a Viscoelastic Rectangular Plate siete 
in Condition Combine Load 


I. Pasternak, I. Turchyn, V. Hutsaylyuk, and H. Sulym 


Abstract Using the methods of solving dynamic problems for inhomogeneous 
centers and boundary elements for the study of geometrically complex and phys- 
ically heterogeneous and nonlinear finite-dimensional bodies, the dynamic problem 
of linear elastic theory for a thin rectangular plate, loaded on opposite sides with 
evenly distributed stresses, has been solved. On loaded sides y; = + h it is assumed 
that longitudinal deformations are zero there. The use of Laplace integral trans- 
formation with respect to time and finite Fourier cosine transformation relative to 
the longitudinal y1 coordinate allowed obtaining general transformational formulas. 
Their reversal was made using the decomposition theorem into simple fractions. For 
this purpose, the characteristic equation and three types of its elements have been 
thoroughly examined, the value of which is also determined by the function of load. 
In the case of a special two-parameter form of the monotonous load function, for 
which by physically realistic image of a fast and smooth rather than a rapid increase 
in load, these transforms were able to be reversed and obtained to determine the 
stress and strain fields at any time, convergent sum. For the possibility of carrying 
out research in a more general case, when a dynamic component is imposed on 
a permanent load, a statistical solution to the analogous issue was also obtained. 
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Calculations were made for four values of the load parameter, which determines the 
measure of the rapidity of its increase. The results obtained meet all conditions of 
credibility. 


1 Introduction 


Determination of the dynamic properties of materials, especially those related to 
their strength and ability to retain the shape, is a vital problem for modern engi- 
neering studies, especially those in the aircraft industry and airspace engineering 
[1]. Numerous experimental studies have revealed a number of peculiarities in the 
dynamic behavior of the materials at the yield point, which had not been observed 
under the quasi-static loading. One of the main peculiarities of the dynamic defor- 
mation is the influence of the strain rate on mechanical properties of the material. It 
is observed, that under the high-speed or impact loading the yield point of most of 
the structural materials is reached at essentially higher levels of the internal stress, 
comparing to slow loading. 

Numerous studies have shown that the main reason of this behavior is caused by 
the sensitivity of the material to the strain rate. Several experimental studies allowed 
to plot and approximate the dependence of the yield stress on the strain rate for 
various materials. The majority of the materials used in modern structural elements 
possess mainly the same yield stress—strain rate dependence, which is plotted in 
Fig. 1. 

Presented results feature the presence of three characteristic intervals of the strain 
rate: rapid growth of the yield stress (II) and its quasi-constant behavior (I, II). 
Moreover, the increase in the yield stress for the majority of structural materials is 
within 80—100%. 

Therefore, the processes, which feature intensive change in the strain rate due to 
the technological conditions or other external and internal factors, can be accompa- 
nied with the effects, which are not inherent to the classical understanding of material 
strength [2]. 


Fig. 1 Dependence of the 2 
yield stress on the yayel' Oy 
deformation speed 15- 
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The aim of this work is to study time transformation of the displacement, strain 
and stress fields in the rectangular plate and to determine main dependences and 
phenomena of the process of strain rate change depending on the intensity and speed 
of loading application, possible loss of strain energy and dimension of the process. 

The work consists of three parts. At first one-dimensional problem of elasticity 
is considered for piecewise-homogeneous plate, which models the contact of a plate 
with inertial equipment, which in turn produces the conditions for energy loss. Next 
basing on the example for plane dynamic problem of elasticity the study is held 
for the influence of finite size and loading rate on the deformation process of the 
rectangular plate without loss of elastic energy. At the end of work, the studies on 
energy dissipation and formation of wave structures are provided. 


1. Dynamic problem for a piece-wise homogeneous plate 


It is known that in dynamic problems for finite solids the rate of external loading is 
crucial both for process identification (dynamic, quasi-static) and for determination 
of the intensity of transient stress-strain state and the strain rate. However, the studies 
of the authors [3] show that the value of strain rate is constant for the considered 
problem; therefore, the yield stress is constant too. This behavior is explained by the 
fact that this problem does not consider possible dissipation of strain energy, or its 
transformation. One of the ways to identify the dissipation of the elastic-plastic strain 
energy is to account for the inertia of measurement equipment, or in other words, to 
account for a perfect mechanical contact of a plate with a massive elastic medium 
during the deformation process [4]. 

Due to the above-mentioned, this study considers dynamic elastic-plastic problem 
for a rectangular plate, which is perfectly bonded with a half-strip made of other 
material (Fig. 2). 

The source of transient processes in a plate is the loading of its edge with time- 
dependent traction p(t). 

The mathematical formulation of the problem lays in the determination of the solu- 
tion of the initial-boundary value problem for the displacement function u (x, T) 
in the plate (¢ = 1) and in the half-strip (@ = 2), which consists of 


e the equation of motion 


gy) g2y u® —_, ary) 
ie OE [0; 1], ae ape (—oo; 0]; (1.1) 


e the boundary condition 


Fig. 2 The sketch of the aa 
problem t Q (2) 
p®) — ~~ G 
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au) 
rah of) = QO +2n—— = pla); (1.2) 
x 
e coupling conditions 
qd) (2) 
x=0: wW=4®, QO+ 20) = (4 2) (1.3) 
x x 


e Sommerfeld radiation condition 


lim u? =0 (1.4) 
x—>—00 
e and initial conditions 
u(x, 0) = u(x, 0) = 0. (1.5) 


The following notation is used in Eqs. (1.1)-(1.15): x = x,/L, tT = re a are 
dimensionless coordinate and time; cf) =,/ ae i = 1, 2 are phase velocities 
of a plate and a half-strip, A, ©, p© are their Lame constants and densities; 
g = 0 fo. 

Applying the Laplace integral transform [5] for time variable to Eqs. (1.1)—(1.15) 
one obtains 


070 0274 
= =sa, x S(O) 1), 3 = @s7u, x € (—00; 0); (1.6) 
x x 
oe (1) () au — Ffe\s 
x=]: ANY +2") = p(s); (1.7) 
Ox 
A yiew du® 
x=0: a =: AM 420%) = AO 4 2M. (1.8) 
Ox Ox 
im a =0, (1.9) 
where #(x,s) = 5° exp(—st)u(x, t)dt is Laplace transform, and s is the 


parameter of the transform. 
Solutions of the transformed Eqs. (1.6), which satisfy the conditions (1.7)-(1.9) 
are obtained as 


P(s)(m sinh(sx) + cos h(sx)) 
AY + 2u)s(sinh(s) + f& cosh(s))’ 
P(s) exp(scx) 

AY + 2u)s(sinh(s) + f& cosh(s))’ 


a) (x, y= 


a(x, 5) = (1.10) 


Modelling of the Stress—Strain State of a Viscoelastic ... 341 


AM+42n%) p@ 
AD420M) pV * 

The inverse of Eqs. (1.10) can be obtained based on the residue theorem [5]. For 
this purpose first of all the roots of the following transcendent equation should be 
determined 


where [4 = 


sinh(s) + wcosh(s) = 0 (1.11) 
These roots are sought in the form of complex numbers 
Sp = a+ ify. (1.12) 
After substitution of Eq. (1.12) into Eq. (1.11) one obtains that 


a =arctanh(—j~!), Br = . + ak. 


Consider the transform of the component ¢‘!) (x, 7) of strain tensor 


au) = D(s)(fL cos h(sx)) + sinh(sx)) 
ax (AD + 2n)(sin h(s) + (ft cosh(s)) 


evass)= (1.13) 


Based on the residue theorem, the inverse of the function 


P(s)(@ cosh(sx) + sinh(sx)) 
(sinh(s) + jf cos h(s)) 


g(x, s) = 


is the following equation 


2 exp(at) = k . 
——. h h —1 
g(x, t) = Vizal {[/ cos h(ax) + sinh(ax)| d| )* cos(Bxx) sin(Byt)+ 
+ [i sinh(ax) + cosh(ax) ly (-D* sin(P,x Jeon} (1.14) 
k=0 


Thus, the convolution theorem [5] results in 


T 


1 
(1) = 
eO(x, 1) = casaan | PC tg(x, tdt, (1.15) 
0 


For the step-like loading p(t) = p* S(t) from Eq (1.15) one obtains 


2p* 


D4 2u),/ jp? 


ef (x,r)= 


(Dk 
h + sinh y ——{ os 
{[@ cosh(ax) + sinh(ax)] 2 2 Be cos(Bx) x 
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Fig. 3| Dependence of the p(t)/p* 
applied load on the l 
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x (Be + exp(at)) [a sin(Byt) — Bx cos(Bet)|+ 


1k 
oR sin(Bxx)(exp(a@t)[a cos(Bxt) — By sin(Bet)] — a) 


+ [a sinh(ax) + cos h(ax)] ae 

In practice, dynamic impact loading is always continuous in time, and increase 

faster or slower from zero value to the limiting one. Therefore, the high-speed increase 

in load is approximated with the relation p(t) = p * (1 — exp(—at))’, which can be 
expressed in terms of the dimensionless time t as follows 


p(t) = p*(1 — exp(—t9))* (1.16) 


where Tt) = (L-a) {co . This relation allows agreeing initial and boundary condi- 
tions, and also it allows accurate approximation of the real dependence of dynamic 
load on time in many practically important cases (Fig. 3). 

For the selected time-dependence of loading the deformations inside the plate are 
defined by the following relations 


2p* - ae 
ef) (x, T) aw 7 Ftd) 1 4 [i cosh(ax) + sites] DAY cos(BRx) Xx 
gf ee exp(at)[a sin(Byt) — By cos(Bxt)] 
a2 + Be 


9 Bk exp(—t08) + exp(at)[(@ + 1) sin(Byt) — Bg cos(Bet)] 
(a+ 1)? + Be 


Bx exp(—2t9T) + exp(@T)| (a + 279) sin(ByT) — By cos(BxT) | f : dl. 17) 
(a+ 219)? + B 


exp(at)[@ cos(Pxt) + Bx sin(Ber) |] — 
a2 4 Bp 


lo. ¢) 
+ [jesinh(ax) + cosh(wx)] S*(-1)* snc ( 
k=0 


9 oxpar)[(@ + 19) cos(BxT) + By sin(Bgt)] — (@ + 19) exp(—t9T) 
(@ + 1)? + Be 
exp(at)[(a + 219) cos(Brt) + Be sin(BxT) | — (a + 219) exp(—t9T) 
(@ + 219)? + Be 
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Accordingly, the components of strain rate tensor at arbitrary point of the plate 
are as follows 


1 
(1) del 2p* 


bxx (tT) = AD + 2u))/a—1 


oo 
( cos h(ax) + sinh(ax)] > (=1)* cos(Bgx) x 
k=0 


Beta exp(at)[a sin(Bet) — Br cos(B;T)] + exp(az)| ap cos(Bet) + Bp sin(Bx7)| 
7 a2 + Be 


‘ —0 Bx exp(—toT) + exp(az)| (a + 1Ta+ 62) sin(Bgt) + (a + BR(a + 70) Bk cos(fx 7) 
(a +19)? + Be 


—Bx2t]) exp(—2t9T) + exp(az)| (a + 2ta+ 62) sin(Bgt) + (a +21 - Br) Br cos(Bx7)| 
(@ + 21)? + Be 


lo.e) 
1 [a sin h(ax) + cosh(ax)] > (-p* sin(Bx+) 
k=0 


exp(at)[a? + B7] cos(Bxt) + (aBx — Bz) sin( Bet) 
x 
a2 + Bp 


snen)|(w? + Toa + Bz) cos(BRT) + (Be -—a- 0) sin(Bx7)| + T9(@ + T)) exp(—ToT) 
(a+ 19)? + Bp 


exp(az)| (a + 2t9a + 62) cos(BRt) + (6; -—a- 2x9) sin(Bx)| + T9(@ + 279) ce) | (1.18) 


(a + 219)? + Bp 


Based on Eq. (1.18) the component él) (x, T) of the strain rate tensor is deter- 
mined, and afterwards, the time-dependence of the dynamic viscoplastic yield stress 
is determined for different values of the intensity of strain energy dissipation (rela- 
tions between the physical and mechanical properties of the plate and the half-strip) 
and the rate of the external loading applied. 

One can see in Fig. 4 that the account of the permanent energy dissipation due 
to the equipment inertia causes the time-dependent behavior of the yield stress of 
the plate’s material, and therefore, can cause its specific behavior under the limit 
(close to the fracture limit) loading applied. For values a — 0, which correspond 


Fig. 4 Time dependence of 2.5 
the yield stress of the plate’s Syéyal Oy 

material for various a=—0.01 
intensities of strain energy 
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to the case of rigidly fixed edge of the plate (elastic waves reflect from the edge of 
the plate without loss of their strain energy) the yield stress does not change during 
deformation process, and therefore, the effect of nonclassical behavior of the material 
close to the yield strength can be not observed. The increase in the parameter, which 
characterize strain energy dissipation, causes the decrease in the time interval of the 
yield strength change, which in instance for a = —0.15 is only 10% of the period 
of yield stress change for a = —0.02. Furthermore, even for small values of the 
parameter a (intensive strain energy dissipation) there exists a time interval, where 
the yield strength changes slightly, which is in good agreement with the experimental 
data [3]. 

The same issues arise during the study of the influence of the load rate on the 
change in the yield stress. For example Fig. 5 depicts numerical results obtained for 
a = —0.02. One should note that for a plate made of the alluminium alloy 2024-T3 
(E = 6.9 GPa, v = 0.3) the value of the dimensionless parameter t) = 1 for the 
length of the plate L = 0.15 m corresponds to the value a = 50 ms™|, or in other 
words, for t) = 1 the external load reaches its stationary value in 0.1 ms. Furthermore, 
the numerical calculations held show that the results obtained for t) = 10 practically 
coincide with those obtained for step-like loading. With the decrease in the loading 
rate one can observe the decrease both in time needed for the change in the yield 
stress, and in the value at which it changes. For t) = 0.2 the material of the plate 
practically does not show its viscoplastic properties. 


2. Plane dynamic problem of elasticity for the rectangular plate 


Now consider the two-dimensional case for homogeneous plate. Such model can 
be obtained from the previous physical model assuming that the half-strip, which is 
bonded with a plate, does not influence the inertia of the plate. This can be achieved 
assuming that elastic moduli or the density of the half-strip are much higher than 
corresponding properties of the plate. 

Consider a rectangular plate of a size 2h x 2/ at x; and y,, respectively (Fig. 6). 
At the time t = 0 the plate is loaded with a normal tension p(t) applied to its fixed 
edges x; = +/. Boundaries y; = +h are traction-free during the entire deformation 
process. The following dimensionless variables and constants are introduced: x = 
a/l,y=y/l,t =ct/l, x9 = h/1, R= c1/co2 = (A+ 2)/m, where c; and cz 


Fig. 5 Time dependence of 2i5 
the yield stress of the plate’s Ga) Oy 
material for various values of 
the external loading rate 
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Fig. 6 Sketch of the =| 
problem Pal 
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are the phase velocities of longitudinal and transverse waves for considered material 
of the plate, 2 and jz are elastic moduli (Lame constants). 

In terms of these variables, assuming that before the time t = 0 the plate was at 
rest, the problem has the following mathematical formulation 


2.1 
ax? dy? ar?” On) 
a7u a-uy a-uy 00 
> 2 =P — (x? - IT; (2.2) 
Ox dy OT dy 
00 duy 
6=—=0, u,=—=0, t=0; (2.3) 
Cha oT 
ae (24) 
Oxx(41, y, T) = p(t) ; 
Oxx(x, Ey, T) = 3 
, 2.5 
Oxy(X, Ly, T) = 0 25) 


where 0(x, y,T) = du, / ox+ duy / dy is the volumetric expansion, u,(x, y, T) and 
uy(x, y, T) are the components of elastic displacement vector, and 


(2.6) 


Oxx = AO + QMExxs Oy = AO + 2[LEyy; Oxy = 2MExy; | 


dux . _ Oy, _— 1f du, duy \, 
Exx = Gy Eyy = Gy? Exy = 3( ay + ax)? 


are the components of stress and strain tensors, respectively. 
From the conditions (2.4), accounting for 
ou x Ou y Ou 
+ly,t)= = — 
on ( dx dy ) 


Ox 


0 


— 


x=t1 x=+1 


it follows that 
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lox (#1, y, tT) = K7O(1, y, T). (2.7) 


The Laplace integral transform for a time variable [5] and a finite Fourier cosine 
transform for a spatial variable x [5] can be applied to Eq. (2.1). Accounting for 
symmetry of the problem, zero initial conditions (2.3), relations (2.7) and conditions 
(2.4), Eq. (2.1) writes as 


COy ns. One 2En 
- 6, = (—1)"t!—=* pos), 2.8 
aye Ga +97) On = (IT Bs) (2.8) 
where &, = m(2n+1)/2, n = 0,1,2,..., and 6, (y, 5) = 


ies cos(&,x) i O(x, y, T) exp(—st)dtdx is Laplace and Fourier dual transform. 
Applying the same dual transform to Eq. (2.2) one obtains 


, (2.9) 


where ¥,(y, 5) = ig cos(&,x) fo” uy (x, y, T) exp(—st)dtdx. 
Accounting for the fact, that the function 6,(y,s) is even with respect to the 
argument y, the solution of Eq. (2.8) is as follows 


(—1)"2&, p(s) 


8n(y. 8) = An(s) cosh(y1y) + 7% (2.10) 
UkK*y; 
with y) = /&? + s?. 
Accounting for (2.9), the solution (2.10) writes as 
Vn(y, 5) = Bn(s) sinh(y2y) + 7 An(s) sinh(yi y) (2.11) 


with yp = /&? + K?s?. 

Another component i, (y,s) = i sin(&,x) fo ux(x, y, T) exp(—st)dtdx of 
the displacement vector can be obtained with the account of the relation u, = 
L(g — dn 


E, n dy ) as 


z —1)"25p 
iin(y, 8) = —By(8)&7!72 cosh(yy) — = An(s) cosh(niy) + PO (2.12) 
Ss KY; 


Two quantities A,,(s) and B,,(s) can be obtained based on the boundary conditions 
(2.5), which dual Laplace and Fourier transform writes as 
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2_ 2)6, qdin — 0, = ty; 
(k ) a dy y yo (2.13) 


—EnVn . a = 0, y=+Y0- 


Accounting for Eqs. (2.10)-(2.12), from Eq. (2.13) one obtains 


s* (7 +y2) sinh( yo) Pn (s) 
A aS (&r-+72 
nls) yp AEs) . (2.14) 
B,(s) = __ 2&2 sinh(1 yo) Pn (s) ; 
Dd yi AG€,s) 


where A(é, 5) = 4&7 y yz sinh(y1 yo) cosh(2 Yo) = 


"2, (K2—2) p(s 
(§2 + y2)° cosh(y yo) sinh(y2yo)s Pn(s) = PPE?) 70), 


Finally, accounting for obtained values of A, (s) and B,,(s) the transforms of 
transient displacements are equal to 


Pn(y, 8) = y, /AW'[(&? + yz) sinh(y2yo) sinh(1 y)— 
—2&? sinh(1 yo) sinh(y) | Pn: 
7 I En 
Un (y, S) = (te + an y2 sinh(j1 yo) x (2.15) 
Pn 


x cosh(y2y) —(&; + yz) sinh(y2y) cosh(yy)]) . 
1 


2 


Inverse Laplace transform is obtained, the same as in previous section, using the 
partial fraction expansions theorem [5]. For this purpose one should first consider 
the first expression in Eq. (2.15) and solve it for singular points of the denominator. 
It is obvious that the roots of the equation y; = 0 are not the singular points of the 
denominator, therefore consider the following equation 


. 2 ‘ 
4&1 2 sinh(yi yo) cosh(y2y¥9) — (& + yz) cosh(yi yo) sinh(y2y0) = 0. (2.16) 


The roots of the characteristic Eq. (2.16) are purely imaginary and complex 
conjugate. Therefore, it is convenient to change variables with s = in to obtain 


M1 = J&2 — 7? and yy = \/&? — x?n?, respectively. It is obvious that the roots ny x 


depend on the discrete parameter &,,, therefore, there are three possible cases of there 
location: 


0 < [11,4 < eB «Ey < |11n,«| < Ens | Nak | > En. (2.17) 
For the first interval, the characteristic equation preserves its form (2.16) and has 


finite number of the roots n,4,1. 
As for the interval «~'&, < ln, «| < &,, the characteristic equation writes as: 


4&? yf sin h(yi yo) cos(Yoy0) — (& + wy cosh(yi yo) sin(Y2yo) =0, (2.18) 
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for Yo = /«?17* — &?. Equation (2.18) has a finite number kz of the roots 7n,x,2. 
And, respectively, for the interval lt, x | > &, the equation 


mi. th. = = ward = ‘ Pe 
4E° 11 Po sin(~1 Yo) Cos(PrY0) + (E&; + 7) cos(Viyo) sin(Vryo) =0, (2.19) 


for ¥1 = /n? — &? has an infinite number of the roots mn,x,3. 

To obtain the inverse of Eq. (2.15) using the partial fraction expansions theorem, 
one should evaluate the derivative of the denominator [5]. To apply this, consider a 
derivative of the expression for A(é, s) for different intervals of location of the roots 
of the characteristic equation: 


d& 
O<Inl< ™; 


ds 


x F y2 
= Ay(n,k) = inn k 1 [asa] 2 sin h(y1 yo) cos h(y2 yo)+ 


s=tnn k,1 


al 


sin h(y| yo) cosh(y2 yo) + xoy2 cos h(y1 yo) cos h(y2 yo) + 


+ «7 yo71 sin h(y1 ¥0) sin h(72y0)] — 44? (& + 72) e058 A(V10) sit h(72 Y0)— 


ee , 2«2yq 
— (&% +72) yo sin (niyo) sinh(ry0)/r1 — (&% + FZ) 2 cosh(r1y0)cosh(yy0)}- (2.20) 
v2 
& d * YQ. - 
=< |nl < én; —_ = Ag(n, k) = Fm k,2 46? ” sin h(y1 yo) cos(¥2 yo)— 
- ds sayy Eo YI 
ey 
= oe sin h(1 yo) cos(¥2 yo) + xo¥2 cos h(y1 yo) cos(¥2.Yo)+ 
2 : ¥ o em ¥ ine 
+ «yoy sinh(y1 yo) sin(¥2y0) | = 4x? (& = 7) cos h(yj yo) sin(y2y0)— 
2 2\2. — 2 =2\2«7y0 . hl 
- (&; - 3) yo sin h(y1 yo) sin(y¥2yo)/v1 + (& - 7) a cos h(y1 yo) cos(¥2 yo) (2.21) 
: as - of - 
In| > En; ae = A3(n, k) = F0n,k,3)4€n | — sin(”1 yo) cos(¥2.y0)+ 
as S=inp k 3 ”1 
raed 


+ ra sin(Y1 yo) Cos(72 Yo) + yo¥2 Cos(1 Yo) cos(¥2 Yo) — 


— «7 yo%1 sin(A yo) sin(Payo)] — 4x7 (& ~ #2) cos(¥1 y9) sin(72. yo) — 


2y 2 Ky, 
= (3 = 7) a sin(7 yo) sin(72 9) + (3 = 2) cos(7| yo) Cos(¥2 Yo) (2.22) 


The second expression in Eq. (2.15) besides the roots of the characteristic 
Eq. (2.16) has singular points placed at the roots of the equation y; = 0: 5 = 


+ié. Accounting for this, the final expressions for transient components of the 
displacement vector write as 


4 2 
ux (xX, y,T) = — l— <= x 
ib Ke 


0 1 2y1 ya sin h(y1 90) cos h(ypy) — (267 — «2m? g 1) sin (72.90) eos A(V1y) 
nz,2 aK, 
x Vevey > 
n=0 


= yA (a,b) 


Ko 2y1 Jo sin h(y] yo) cos(7y) — (282 = ee) sin(72 yp) cos h(yy y) 
x f(n,k.1+ 7) t ye 


fam ypAo, kb) 


x 


Ff (n,k,25 7) + 


Modelling of the Stress—Strain State of a Viscoelastic ... 349 


22 2} Po sin(H1 yo) cos(Pry) + (27 — K? nN? , 3) Sin(P2¥9) cos(71 y) 
+ >> ic a ) f(1n.k3>T) } sinEnx); (2.23) 
k=1 Yj A3(n, k) 


4 2 
uy(xX,y,T) = *( - =)x 
: in a) 


S i SS (267 — «292, 1) sin h(y2y0) sin hy y) — 267 sin A(y1 yo) sinha) 
x - ny = x 
ces = yi Ain, k) 
ee > (262 — «2n? 4,9) sin(Pay0) sin (1 y) — 262 sin A(y1 90) sin( Fay) Pett 
J n,k,1> T = zu n,k,2> 
k=1 yi A2(n, k) 


= (267 — «22 4.3) sim(72yo) sin (Vi y) — 267 sin(F1 yo) sin(F2y)) 
k=l 7 Ag (n, k) 


f Mn,k,3>T) cos(Enx), 


where 


tT 


f@,7) = [vc — t)sin(nt)dt. (2.24) 


(0) 


Equations (2.23) expresses exact closed-form solution of the transient dynamic 
problem of elasticity for a rectangular plate under arbitrarily time-dependent external 
load. 

The components of strain and stress tensors are evaluated by applying strain- 
displacement and stress-strain relations (2.6) to the obtained displacement compo- 
nents (2.23). Also it can be shown that all series in the solution (2.23) converges 
uniformly, therefore differentiation operators used in derivation of stress and strain 
tensors can be applied directly under the sum sign. 

Particularly, to evaluate stresses at the arbitrary point of the plate based on 
Eqs. (2.6) and (2.23) one obtains the following formulae 


ace “2 y2 sin h(71 Yo) cos h(y2y) 
acts.) =4(1- 3) (-1)” — ——~ f (n.k.1,T)— 
XX ha » En » yzAi(n, b f ( nk, 1 ) 


ti (262 — «72 «1 ) (262 + (7 — 2)2 «.1) sin h(yay0) cos (Ay) 


= Sf (1m,k,1s T) + 
= yr Ai(a, k) . 
ko ahs Gi ie 
~ 217 sin h(i yo) cos(y) 
t TA Ff (11n,k,25 t)- 
= yp Aain, k) 
ss (282 — xn? 2) (282 + (« — 2)n2. 4.2) sin(Fay0) cos h(n y) Pepi 
x F(Mn,k.s T)4+ 
= vy? Ao(n, k) 7 
2” y2 sin(V1 yo) cos(72y) 
c us i Sf (1m.k.3: t)+ 


pPA3(n, k) 


(28? = ne.) (28 at (x? = 2)n2 4.3) sin(Y2 yo) Cos(1 y) 
PP A3(n, k) 


Ff (11n.k,35 t) cos(Enx); (2.25) 
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oe, =4(1 _ 2\y ee iyttte, > 2y1y2 sinh(y1 yo) cos h(y2y) finsacd= 
n=0 vi Ai(n, k) 


ki (262 - ng) sin h(y2yo) cosh(y) 


Sf (1mn,kts T) + 


k=1 ypAin, k) 
_ YO 2r1%2 sin h(n1 yo) cos(ry) (ose- 
4 y? Ao(n, k) — 
2 
ky (262 — «a2 ao palace 
~ J "n,k, +t +e 
a yp Adin, k) we 
271% sin(f1 yo) cos(Pry) 
Ff (Mn,k,3 T)+ 
X FP Asin, k) meee) 
oo (2&? — x77?) sin(P2y0) cos(f1y) 
x. ( y F(ttn.k,3+T) | c08(Enx); (2.26) 
fel p7 As(n, k) 
cot n-=a(t~ 3) Peo" ef et er ema)* 
= MAM, 


sin h(y2x0) sin h(y1x) 
x Nn,k.l.T 5: 


yi Arn, k) (287 = 0 mgt) S (tn kas T)+ 
k=1 i 


3 sin h(y, x0) sin(y2x) — sin(2xo0) sin h(x) (2 


v1 Ao(n, k) bx ne go) F (Mn,k2» T) + 
k=l 


= sin(J1 x0) sin(frx) — sin(P2x0) sin(7ix) ( 


a Asta B EF = Me 3) £(n,k,3>T) f SiM(Eny). (2.27) 
1A3(n, 


k=1 


For numerical analysis the following external loading is selected, which is defined 
by the time-dependence (1.16) 


p(t) = p*(1 — exp(—t9t))” 


Numerical calculations are held for a plate made of aluminum alloy 2024-T3 
[6], which Young modulus and Poisson ration are equal to E = 6.9 GPa, v = 
0.3, respectively. One should note that the value of the dimensionless time tT = 1 
correspond to the time t = 0.02 ms for the length of the plate, which equals 2/ = 
0.3 m. 

Evaluating the integral (2.24) for the load-time dependence given by Eq. (2.28) 
one obtains explicit expression for the function f (7, T): 


2n exp(—ToT) ; n exp(—2to0T) : 
i? + 1 Se 46 


2n n 1 P 2t0 2to 
+ cos(nt + sin(yt q 2.28 
+ cos(n (a3 eae ‘) + sin(y (3 oes) (2.28) 


1 
penn = ps | 
u] 
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Fig. 7 Longitudinal stress in 0;,(0.5,0,7)/p* 
the plate obtained for 
different values of the 
loading rate 


Figure 7 shows the time dependence of the dimensionless longitudinal stress 
0x, /p* at the point, which is located at the symmetry axis of the plate (x = 0.5, y = 
0). The stress is calculated by means of Eqs. (2.25) and (2.29) for different values of 
the parameter To) (loading rate). 

One can see in Fig. 7 that the time dependence of the longitudinal stress essentially 
depends on the parameter To: for tf) = 0.2 this dependence is quasi-static one (repeats 
time dependence of the external load), and for tq > 10 it practically coincides with 
vibrations caused by the abrupt (shock) loading p(t) = p * S;(t). Increase in the 
parameter To also causes increase in the amplitude of vibrations and decrease in time 
of the arrival of the first wave-front. 

The same conclusions can be made for dynamic distributions of the dimensionless 
transverse stress at the same point (Fig. 8) and the dimensionless shear stress at the 
point x = 0.5, y = 0.15 (Fig. 9) depending on the parameter To. 

However, unlike previous results, the influence of the parameter Tt) on the ampli- 
tude of vibrations is more significant: the increase in the loading rate parameter from 
the value of t9 = | to t) = 5 causes increase in the amplitude of transverse stress 
up to three times, and increases the shear stress up to two times. At the same time 
the longitudinal stress increases at only 40-45%. 

Spatial distribution of dimensionless transverse stresses oyy(x, y,t)/p* for 
different moments of time is presented in Fig. 10. 


Fig. 8 Transverse stress in Oyy(0.5,0,7)/p* 
the plate obtained for 0.4 
different values of the 
loading rate 
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Fig. 9 Shear stress in the Gxy(0.5,0.15,t)/p* 
plate obtained for different 0.01 
values of the loading rate 


-0.01 


0 2 4 6 8 10 


Besides, one can see that maximal magnitude of the longitudinal stress is much 
higher than corresponding of transverse and shear stresses. In this context, one can 
arise a question about the difference between the obtained distributions of longitu- 
dinal stress and the corresponding one, which is obtained by neglecting the change 
of elastic field parameters in the y direction (one-dimensional model). 

In the case of the one-dimensional model, based on relations (2.28), the 
longitudinal stress in the plate are defined by the following formula 


Oxx(X,T) ae alia 2€p exp(—toT) En exp(—2tgT) 
2 (-1) t + cos(EnT) 
mem | ae oe 2a ee 
2En En 1 : 20 210 
x + sin(EnT) cos(Enx). 2.29 
(a G+ ag z) - (ar za) ee ee? 


Figure 11 presents the comparative analysis of distributions of the dimensionless 
longitudinal stress inside the plate, obtained based on Eqs. (2.25) and (2.29) for 
To = 5. 

One can see in Fig. 11 that the error of calculations obtained based on the one- 
dimensional model does not exceed 10-15% comparing to the two-dimensional one. 
Accounting for the abovementioned results for shear and transverse stresses, this 
allows using a simple one-dimensional model in the applied engineering analysis. 


3. Finite element modeling of energy dissipation under impact loading 


For modeling of dynamic elasto-plastic behavior of plates under dynamic loading 
the finite element method was used [7]. Figure 12 presents total elastic energy of 
the rectangular specimen under the impact loading. One can see that just after the 
impact elastic strain energy possess highest values, and then it decreases, and starts 
oscillating. In turn, plastic strain energy (Fig. 13) increases monotonically to some 
extent, and then it is constant, which means, that the energy has already dissipated. 

Figures 14 and 15 present wave propagation and development of plastic strain 
in the plate under impact loading. It is visible that plastic zones possess wave-like 
structure, which is also observed in the experiments. 


4. Analytic energy approach in modeling of dissipative structures 
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Fig. 10 Spatial distribution of transverse stresses oyy(x, y, T)/p* for different t 


Structural materials, especially those used in the aircraft industry and airspace engi- 
neering, are subjected to various types of impact load during their long-term exploita- 
tion. Numerous studies have revealed a number of peculiarities in the dynamic 
behavior of the materials at the yield point, which had not been observed under 
the quasi-static loading. Recent experimental studies [2, 3] have shown that the addi- 
tional impact loading imposed on the prestressed samples made of elastic-plastic 
materials can lead to the irreversible changes in material structure, in particular to 
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Fig. 11 Distribution of longitudinal stress in the plate for different moments of time obtained based 
on the two-dimensional a and one-dimensional b models 


Fig. 12 Elastic energy of Energy 
the specimen 100 
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Fig. 13 Plastic energy (energy dissipation) 
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Fig. 14 Stress wave propagation and development of plastic strain 
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Fig. 15 Plastic energy density 


formation of the so-called dissipative structures. The latter are usually observed as 
bands placed at the approximately equal distance one from another (Fig. 16). Thus, 
the structure of the material essentially changes, and it possesses new mechanical 
properties. 

Structural changes inside the material were associated with the amount of energy 
supplied to the sample during the impact loading [2, 3]. However, these models 
provided only qualitative analysis of the physical processes involved in the formation 
of the dissipative structures. It was observed that the increase in the loading rate 
essentially increases the magnitude of vibrations, and only slightly influence their 
frequency. Vibrations have harmonic behavior. The study of the plastic energy shows 
that it increases monotonically, however the total strain energy rapidly reaches its 
maximum and at certain time, when the plastic energy reaches its maximum, it 
continues to change periodically. The time, when the plastic deformation process is 
over, is much greater than the time, which is required by the wave to path through the 
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Fig. 16 Dissipative 
structures obtained under 
additional impact loading of 
the aluminium alloy 2024-T3 
sample 


plate. Thus, the patterns of plastic strains are formed by both incident and scattered 
waves, which can obviously induce a standing wave. 

However, to this end there are no quantitative approaches, which allow simplified 
analysis and modeling of impact loading induced dissipative structures. The ordering 
of the obtained material structure indicates that certain periodic processes takes place 
during or just after the impact loading of the material. Most simple and obvious 
periodic process, which can take place after the impact, is a standing wave, which 
dissipates its energy at the antinodes. The latter can be related with the bands of 
dissipative structures. Another approach to modeling can be found e.g. in [13-15]. 

Therefore, this study provides simplified analysis of the standing waves, which 
can induce dissipative structures in the elastic-plastic materials, and relates their 
wavelengths with the parameters involved in the experimental processes of additional 
impact loading of elastic-plastic materials. 

Consider a prismatic sample of a length /, which cross section area is equal to S. 
Thus, its volume equals V = S/. Assume that during the impact loading the sample 
obtains some amount of energy W. Consider an associated standing wave, which 
energy equals W. According to [8] the mean energy density (w) of elastic wave 
equals 


1 
(w) = 5p(Ale)’, (4.1) 


where p is the density of the material; wm is an angular frequency, and Al is the 
amplitude of the wave. The wave amplitude A/ can be related to the displacement 
of the specimen’s end during the impact loading. 

It is obvious that the total energy inside the specimen equals 


W =(w)V =(w) Sl = ; pSI(Alw)?. (4.2) 
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According to the energy conservation law, the energy W induced by the impact 


loading should be equal to the work A produced by the external force. The work A 
of a force equals 


A= / Fdx. (4.3) 


Due to instantaneous character of the impact loading, one can assume that the 
force F is a linear function of x, i.e. 


xl 


where 0 is the initial stress of the sample; Ao is the change in stress during the 
impact loading. Substituting Eq. (4.4) into Eq. (4.3) one can obtain 


1+Al I+Al 1 1 
As / Fdx =S | ( ft ao~)ax = s( ee =o) Al. (4.5) 
1 1 


Equaling Eqs. (4.3) and (4.5) the following equation can be written 
“Ag )— = xp(lw)? (4.6) 


Noting that the deformation of a sample during the impact equals Ae = Al f l, 
Eq. (4.6) writes as 


09 + 0.5Ao 1 2 
———__ = _ (lw). 4.7 
Ke 5 @) (4.7) 
The left-hand-side of Eq. (4.7) 
0.5A 
i, eo ee (4.8) 
Ag 


can be determined experimentally from the obtained stress-strain curves. One can 
readily estimate the value of k. In the case of static or quasi-static loading with zero 
initial stress k is equal to the half of material Young modulus £. It is obvious that 
due to the inertia phenomenon the impact loading is accompanied with higher stress 
and lower deformations than in the case of static loading. Therefore, the constant k 
should be bigger (or much bigger) than the Young modulus: k >> E. The value of k 
is dependent on the material properties and on the deformation rate (or stress rate). 
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Based on Eq. (4.7) the angular frequency of the standing wave, which dissipate 
energy supplied by the impact loading is equal to 


o=—/—. (4.9) 


The wavelength of the standing wave is equal to 


2c p 
X= — =27Ilc,/—, 4.10 
a ER a) 


where c is a phase velocity. Since there are two antinodes at the wavelength A, the 
distance between them, which is associated with the distance between the dissipative 


bands, equals 
1 p 
d= -XA=TIc,/—. (4.11) 
2, 2k 


The phase velocity c of the elastic wave in a solid is related to its density and 
Young modulus by the following dependence 


ee a (4.12) 


= E(1-v) : : : 
where Ey = dd? and v is a Poisson ratio. 


According to Ref. [9-12] the phase velocity is sensitive to the plastic behavior 
of the material, and when the material deformation is accompanied with plastic 
deformations the phase velocity is lower, than in the elastic case. Thus, one of the 
distinctive features of these processes is dynamic properties of materials, which 
primarily concern theoretical foundations of materials science and, in particular, 
the study of the phenomenon of formation of dissipative structures under impact 
supply of deformation energy. However, one can use Eq. (4.12) assuming that Ep is 
dependent on strain and strain rate. Then Eq. (4.11) writes as 


Eo 


Bagi)”. 
a OE 


(4.13) 


Equation (4.13) gives a useful relation between the distance d of dissipative struc- 
tures, the length / of the specimen, and the characteristic parameter k, which account 
for the impact loading and initial stress of the sample. However, one can easily deter- 
mine only the value of d, using for example a metallographic analysis. The impact 
modulus k given by Eq. (4.8) can be determined from the stress-strain curve, but the 
latter should be precisely plotted. This is due to the fact, that the impact loading is 
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instantaneous, and thus due to the equipment inertia it is rather hard to determine 
true values of stress and strain increments. Another issue is to determine the value 
of Eo for the elastic-plastic material. Further experimental studies should be held to 
estimate parameters involved in Eq. (4.13). 

However, Eq. (4.13) allows explanation of the phenomena of dissipative structures 
based on the classical viewpoints of continuum mechanics. According to the analysis 
provided after Eq. (4.8) one can note that for quasi-static loading 2k ~ Eo, and 
Eq. (4.13) results ind ~ al > 1. Thus, one can conclude that no dissipative structures 
can be observed under the quasi-static loading, since the expected distance between 
them is greater than the length / of the specimen. On the other hand this proves the 
obvious fact that no dynamic processes are observed under the quasi-static loading. 

In contrast, for the impact loading k >> Eo, therefore, according to Eq. (4.13)d « 
1. Thus, the phenomenon of dissipative structures and their band-like distribution can 
be explained by the transient elastic-plastic dissipative waves, which are induced by 
the impact. Besides, according to Eq. (4.8) the higher is the initial stress og the higher 
is k. Itis obvious that the highest values of k can be obtained when the initial stress o 
is approximately equal to the yield stress oy, since at this point the strain increment 
Ag is still small, comparing to that for higher og. This behavior is well agreed with 
the experimental studies [2, 3]. 


2 Conclusions 


Based on the mathematical model of the viscoplastic phenomenon and the elastic 
solution of the dynamic problem for a piecewise homogeneous half-strip the influence 
of the strain energy dissipation and the loading rate on the changes in yield stress is 
studied. 

It is shown that the account of the equipment inertia (the phenomena of strain 
energy dissipation) causes the change in the yield stress during the deformation 
process and can be the basis for scientific explanation of the experimentally observed 
phenomenon of nonclassical behavior of materials under the limit high-speed loads. 

It is also shown that the change in the yield stress during the process of dynamic 
deformation is essentially influenced by the speed of external loading (loading rate). 

Based on the double Fourier—Laplace integral transform the exact closed-form 
solution is obtained for the two-dimensional dynamic problem of elasticity of a 
rectangular plate, which model its interaction with a rigid half-strip. 

Numerical analysis is held for a stress state of the plate and the comparative 
analysis is made with corresponding one-dimensional problem. It is shown that the 
simpler 1D solution has the error less than 15% comparing to the exact solution of 
2D problem, which can be used in engineering calculation of dynamic loading of 
plate-like structural elements. 

Also this study considers formation of dissipative structures in the elastic-plastic 
aluminium alloy and relates it to the standing wave initiated by the impact supply 
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of mechanical energy. Due to the plastic irreversible deformations this wave dissi- 
pates its energy at the antinodes, thus forming the bands of dissipative structures. 
The simplified analysis results in a simple equation relating the distance of dissipa- 
tive structures, the length of the specimen, and the characteristic parameter, which 
accounts for the impact loading and the initial stress of the specimen. Thus, the 
quantitative analysis can be held to assess the properties of elastic-plastic materials 
after the impact loading of prestressed samples. It is shown that the account of the 
equipment inertia is crucial for the assessment of parameters involved in the quanti- 
tative model of the phenomenon of nonclassical behavior of materials under the limit 
high-speed loads and dissipative structures formation. Besides, one of the distinc- 
tive features of these processes is dynamic properties of materials, which primarily 
concern theoretical foundations of materials science and, in particular, the study of 
the phenomenon of formation of dissipative structures under the impact supply of 
deformation energy. 
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Formulas for the Sums of the Series ®) 
of Reciprocals of the Polynomial sie 
of Degree Two with Non-zero Integer 

Roots 


Radovan Pottéek 


Abstract This chapter deals with the sums of the series of reciprocals of the 
quadratic polynomials with non-zero integer roots. It is a follow-up and a comple- 
tion to the previous author’s papers dealing with the sums of these series, where the 
quadratic polynomials have all possible types of positive and negative integer roots. 
First, the conditions for the coefficients of a reduced quadratic equation are given so 
that this equation has only integer roots. Further, summary formulas for the sums of 
the series of reciprocals of the quadratic polynomials with non-zero integer roots are 
stated and derived. These formulas are verified by some examples using the basic 
programming language of the computer algebra system Maple. The series we deal 
with so belong to special types of infinite series which sums are given analytically 
by simple formulas. 


Keywords Sum of the series - Harmonic number - Generalized harmonic number - 
Roots of the reduced quadratic equation - Telescoping series - Computer algebra 
system maple 


1 Introduction and Basic Notions 


Before we pick up the threads of author’s preceding papers [1-6], let us recall the 
basic terms and notions. The series 


o.@) 
So ay = a) tag tagt-- 
k=1 


converges to a limit s if and only if 
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lim s, = lim (aj +a. +---+a,) =S. 
noo noo 


We say that this series has a sum s and write 


The nth harmonic number H(n) is the sum of the reciprocals of the first n natural 
numbers 


Ed 1 
H i Se a ek Si 
= Tar er 5 


H (0) being defined as 0. 
The generalized harmonic number H(n, r) of order n in power r is the sum 


where H(n, 1) = H(n) are harmonic numbers. Generalized harmonic number of 
order n in power 2 can be written as a function of harmonic numbers using formula 
(see [6] and [7]) 


n—1 


7 H(k) H(n) 
Ha.) = ) aD . n- 


For r = 1,2 and form = 1, 2,..., 10 we get the following Fig. 1. 

For calculation these finite sum the WolphramlAlpha® widget [8] can be also 
used. 

The sum of the reciprocals of some positive integers is generally the sum of unit 
fractions. For example the sum of the reciprocals of the square numbers (the Basel 
problem) is 


n |e] 8 4 5 6 7 8 9 10 
3}11) 25 | 137 49 363 761 7129 7381 
H(n) |1/=|/—| — 
2| 6 | 12 60 20 140 280 2520 2520 
H(n,2) | 1 3 | 49 | 205 | 5269 | 5369 | 266681 | 1077749 | 9778141 | 1968329 
i 2 | 36 | 144 | 3600 | 3600 | 176400 | 705600 | 6350400 | 1270080 


Fig. 1 The values of harmonic numbers H(n) and generalized harmonic numbers H(n, 2) for 
n=1,2,..., 10. Source Own calculation in Maple 16 
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2 


meee = 7 = 70) = 1.6449 
Re 1 4°9° 16 me ig Me) re ; 


where 


Se ee 2am 
RB 1° 8 (27 64 es 


a 
Ss se) = — = C(4) = 1.0823. 
= ke 1 1681 256 90 ol 


[2 
| os 
ie 
I 
1 
he 


Another interesting example is the sum of the reciprocals of the palindromic 
numbers. The palindromatic number remains the same when its digits are reversed, 
so it is the special type of “symmetric” number. These numbers create the sequence 


(0, 1, 2,3, 4,5, 6, 7, 8, 9, 11, 22, 33, 44, 55, 66, 77, 88, 99, 101, 111, 121, 131, 141,...) 


and the sum of their reciprocals, except the first term, is 


(see e.g. [9-11]). 
The telescoping series is a series whose partial sums eventually only have a fixed 
number of terms after cancellation. For example, the series 


1 
pS k(k + 2) 


oe) 
k=1 


simplifies as 


1 1 1 "(1 1 
~-- )=> lim )>(-- —~ 
k k+2 2 n>00 ; k k+2 
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2 Integer Roots of the Reduced Quadratic Equations 
Let us consider a reduced quadratic equation, i.e. the equation of the form 
x? + pxt+q=0, (1) 


with an unknown x and with integer coefficients p, q (a leading coefficient is one). 
It is a well-known fact that the roots x;,2 of this quadratic equation have the form 


~p+ VP ay 


2 


M2 = (2) 

First, we derive the condition for the coefficients p, q of Eq. (1) so that this equa- 
tion has only integer roots. More precisely, we derive a condition for the coefficient 
q for a given coefficient p so that Eq. (1) has only integer roots. If the roots x; and 
X2 are to be an integer, then equality 


2n = —p+/p? —4q 


must hold. When we raise this equality to the square, we receive the equality 
(2n + p)’ = p* — 44. 
After cancellation the terms p7 and after dividing by four we get the relation 
as pn+q=0, 
whence 
q=—n(n + p). 


so the quadratic equation with only integer roots has for given integer coefficient p 
and arbitrary integer n the form 


x? + px —n(n+ p) =0. (3) 
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p\n |e sla sei oe eS as ie 
Sem) —72| -55 | -40|-27|-16| -7|/0| 5s | 8 fmm 8 | 5 

Be —c6 | -50 | -36 | -24|-14| -6|/0) 4 om 4 | Oo | -6 
am) —60 | —45| -32|-21|-12|-5 |o| 3 fm 3 | o | -S | -12 
ayy] —54 | -40 | -28 | -18 | -10| -4 | 0 [ume «60 «| -4 | -10 | -18 
pm —48 | -35 | -24| -15| -8 | -3 | 0 EN oO 3 | -8 | -15 | -24 
i) —42 | -30 | -20 | -12 | -6 | -2 2 | -6 | -12 | -20 | -30 
0 36 | -25|-16| -9 | —4 | -1 fm -1 | -4 | -9 | -16 | -25 | -36 
1 30 | -20| -12| -6 | —2 [QMO -—2 | -6 | -12 | -20 | -30 | -42 
2 24 || =15'| <8 o fmm o| -3 | -8 | -15 | -24 | -35 | -48 
Bey} —is|-10| -4 | o [em Oo | -4 | -10 | -18 | -28 | -40 | -54 
ae) -12/ -5 | o | 3 fem 3 |o| —5 | -12 | -21 | -32 | —45 | -60 
5 |-6/ 0] 4 | 6 | 6 | 4/0] -6 | -14| -24 | -36 | -50 | -66 
6 o | 5 | 8 fam s | 5 |0| -7 | -16 | -27 | -40 | -55 | -72 


Fig. 2 The values of coefficient g = —n(n+ p) for p = —6,...,5,6 andn = —6,...,5,6. 
Source Own calculation in Maple 16 


Now, we compute some values of the coefficient g as a function of integer 
coefficient p and integer n. We receive the Fig. 2 obtained by simple commands 


for p from -10 to 10 do 
print ("****p=",p,"kKKEM) ; 
for n from -10 to 10 do 
print ("q(",p,n,")=",evalf (-n* (n+p) )); 
end do; 
end do; 


in the computer algebra system Maple 16. We present only 137 = 169 from all 
computed 217 = 441 values in the following Fig. 2. 
For even values of coefficient p we get following values of coefficient g(n € No): 


p=0:q¢=0,-1,—4, —9, -16, —25,..., 
ie. g=0—0,0—1,0—4,0—9,0—16,0—25,..., 


ie. g=0-7n’, 


p=+2: q=1,0,-3,—8, —15, —24,..., 
ie ge l1=01=1.1=4,14 9/1 = 16,1 295,00; 
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ie.g =1—n’, 


a q = 4,3, 0,—5, —12, —21,..., 
ie. g=4—-0,4-—1,4-—4,4-9,4- 16,4—25,..., 
coped 


p=xt6: ¢g=9,8,5,0, —7, —16,..., 
ie. g=9—-0,9-—1,9-—4,9—-—9,9—- 16,9 —25,..., 
ie. g=9-7n’, 


— q = 16, 15, 12,7,0,—9,..., 
ie. g = 16—0, 16-1, 16—4, 16 —9, 16 — 16, 16—25,..., 
a 


p=+10: gq = 25, 24,21, 16,9,0,..., 
be. p= 25 — 0), 25 — 1,95 =A, 25 — 9,25 — 16, 25 — 25,.<:, 
ie. g =25—n? 


Altogether, if the coefficient p is even, we have the following formula for 
coefficient g: 


q=—-n, néENo. (4) 


This form of the coefficient g guarantees that the quadratic Eq. (1) has for even 
coefficient p only integer roots x, 2. 
For odd values of coefficient p we get following values of coefficient g(n € No): 


p=l: q=0,-2, —6, —12, —20, —30,..., 
ie. g=0-1-0,0-—2--1,0—3-2,0—4-3,0—5-4,... 
ie. g=0-(n+I1)n, 


p=+3: q=2,0,—4, —10, —18, —28, . 
ie, PSP 10 929212 =3.9,0= pee BadSud. 
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ie. g=2-—(n+1)n, 


p=x5: q=6,4,0, —6, —14,—-24,..., 
Le. g=6-—1-0,6—2-1,6—3--2,6—4-3,6—5-4,..., 
Le. g=6-—(n+ I)n, 


p=x7: gq = 12, 10, 6,0, —-8,—-18,..., 
ie. g=12-—1-0,12-—2-1,12-—3-2,12-—4-3,... 
Le. g=12-—(n+ Dn, 


p=+9: gq = 20, 18, 14,8,0,—10,..., 
ie. g =20—1-0,20—2-1,20—3.-2,20-—4-3,..., 
ie. g = 20-—(n+ 1)n. 


Altogether, if the coefficient p is odd, we have the following formula for coefficient 
q: 


—1 
q= —n(n+1), neNo. (5) 


This form of the coefficient g guarantees that the quadratic Eq. (1) has for odd 
coefficient p only integer roots x, 9. 
Let us note that the terms of the sequence 


(n(n + 1))nen, = (0, 2, 6, 12, 20, 30, 42, 56, 72, 90, ...) 


are so named pronic or rectangular numbers. The nth pronic number is twice 


the nth triangular number T7,,. Triangular numbers T, = n(n + 1)/2 = (" ; ') 


create the sequence 
(0, 1, 3, 6, 10, 15, 21, 28, 36,45...). 


Example 1 Decide which of the following quadratic equations have integer roots 
without solution these equations by the quadratic formula (2): 


(a) x7+10x—18=0, (b) x7+10x+16=0, 


370 


R. Pottéek 


(c) x7+1lx-10=0, (d) x*+11x—-12=0. 


Solution 


(a) 


(b 


wm 


(c) 


(d) 


We have p = 10, i.e. p/2 = 5, so p?/4 = 25, and g = —18. Using Formula 
(4) for even p we get q = —18 4 25 — n’ for no integer n, because there is no 
n satisfying the equation n* = 43. In addition, the sequence (25 = Wi has, 
among others, the terms ..., —24, —11,0,9,..., but has not the term —18. So 
the quadratic equation x? + 10x — 18 = 0 has not integer roots (its roots are 
real numbers x). = (—10 +/100+4- 18) /2, so xj = —11.56, x2 = 1.56). 
We have p = 10, so p?/4 = 25, and q = 16. Using Formula (4) for even p 
we getg = 16 = 25 — n, hence n? = 9, son = 3. We search n € No, so 
we do not consider the negative root n = —3. Therefore the quadratic equation 
x? + 10x + 16 = 0 has integer roots. Because we can write x? + 10x + 16 = 
(x + 8)(x 4+ 2), the roots are obviously x; = —8 and x2 = —2). 

We have p = 11,80 (p?—1)/4 = 30, andg = —10. Using Formula (5) for odd p 
we getq = —10 4 30—n(n + 1) forno integer n, because there is non satisfying 
the equation n? +n = 40 (the sequence (n? + Ht) ox = (n(n + 1))nen, has the 
terms 0, 2, 6, 12, 20, 30, 42, 56, .. .), so the equation x? + 11x —10 = Ohas not 
integer roots (its roots are xj. = (-11 + /121+4-10)/2, so x; = —11.84, 
X2 = 0.84). 

We have p = 11, so (p?— 1)/4 = 30, and g = —12. Using Formula (5) for odd 
p we get q = —12 = 30 — n(n + 1), hence n? +n = 42, son = 6. Therefore 
the quadratic equation x? + 11x — 12 = 0 has integer roots. Because we can 
write x? + 11x — 12 = (x + 12)(x — 1), the roots are obviously x, = —12 and 
x2 = 13 


The Sum of the Series of Reciprocals of the Quadratic 
Polynomials with Different Non-zero Integer Roots 
with the Same Signs 


Let us denote the sum of the series of the series 


sa 1 
x (k — a)(k — b) 
k=1 


k £a,b 


of reciprocals of the quadratic polynomials with different positive integer roots a 
and b, a < b, as sta, bt). In the paper [1] it was derived that this sum is given by 
formula 
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1 
s(a*, b*) = ——[H(a 1) — H(b—1) +2H(b—a) —2H(b—a—1)] (6) 


where H(n) is the nth harmonic number. 
In the paper [2] it was shown that the sum s (a. b-) of the series 


dl 
2. (=aew ae b) 0 


k=1 


of reciprocals of the quadratic polynomials with different negative integer roots a 
and b, a < b, is given by formula 


s(a_,b-) = : [H(—a) — H(—b)]. (8) 
b-—a 

Let us briefly describe the method by which this formula was derived. In the cases 

of other types of roots of the quadratic equation, which we will set hereinafter, the 
method would be fundamentally similar. 

First, we express the nth term a, of the series (7) as the sum of two partial fractions 


1 A B 1 1 1 
Gn = = + =...= 
(n—a)(n—b) n-a n-—b b-—a\n-—b n-a 


For the nth partial sum s,, of the series (7) so we get 


a 1 1 it 1 1 
n-1-—b n-1l-a n—-b n-a/)| 
The first terms that cancel each other will be obviously the terms for which, for 
the suitable index @, it holds 1/(1 — a) = 1/(€ — b). Therefore the last term from 
the beginning of the expression of the nth partial sum s,,, which will not cancel, will 


be the term 1/(—a), so that the first terms from the beginning of the expression the 
sum s,, which will not cancel, will be the terms generating the sum 


Analogously, the last terms that cancel each other will be the terms for which, for 
the suitable index m, it holds 1/(n — b) = 1/(m — a). Therefore the first term from 
the ending of the expression of the nth partial sum s,,, which will not cancel, will be 
the term 1/(n + 1 — b), so that the last terms from the ending in the expression of 
the sum s,, which will not cancel, will be the terms generating the sum 
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1 1 1 
n+1—-b n-2-b n—-a 


After cancellation of all the inside terms with the opposite signs we get the nth 
partial sum 


ol 1 & 1 rs ve 1 1 1 
a l1—-b 2-b -—a n+l1—b n-2-b n—-a) 


Because for integer c it holds lim 1/(m + c) = 0, then the searched sum is 
n—->oo 


1 
al [H(—a) — H(—b)). 
—a 


Now, let us try to write Formulas (6) and (8) for the sums (a*, bt) and s (a, b-) 
using a single common formula. Because clearly it holds 


H(b—a)-— H(b-a-l)= , 


we can rewrite Formula (6) in the form 


s(a*, bY) = >| He 1)-— H(b b+5—<], 


i.e. in the form 


H(a—1)—H(b-1) 2 


+ bt) = : 9 
s(a ) b =f} (b =. ay” ( ) 
Similarly, we can rewrite Formula (8) in the form 
_ ,-\_ A(lal) — Hb) 0 
bo )y= : 10 
s(a ) b —a (b = a) ( ) 


Altogether from Formulas (9) to (10) we get common formula 


s(a?,b’) = a Hla ae | a || 
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2 + sgn(a) + sgn(b) 
2(b— a)’ 


’ 


where p is a sign (+ or —) of the integer roots a and b and where 


—1 for n <0, 
sgn(n) = 40 for n=O, 
1 for n>0 


is the signum function of a number n. 
If we denote 


0 for n<0O 
1 9 
Aten) 1/2 for n =0, ap 


1 forn>0 


as o(n) and because it holds 


2+sgn(a)+sgn(b) _ 2{[1 + sgn(a)]/2 + [1 + sgn(b)]/2} 
2(b — a)* 7 2(b — a)’ 
[1 + sgn(a)]/2 + [1 + sgn(b)]/2 
(b— a)? 


’ 


we can write the preceding formula in a clearer and shorter form 


H||a| —o(@)] — Alb] —o()] as o(a) +o(6) 


ee b—a (b—ay 


(12) 


4 The Sum of the Series of Reciprocals of the Quadratic 
Polynomials with Double Non-zero Integer Roots 


In the paper [3] we derived the sum s(a~, a7) of the series 


of reciprocals of the quadratic polynomials with double negative integer root a is 
given by formula 
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2 


s(a~,a”) = ~ — H(-a, 2), (13) 


where H(—a, 2) is the generalized harmonic number of order —a in power 2. 
The formula for the sum s(at, a*) of the series 


of reciprocals of the quadratic polynomials with double positive integer root a was 
derived in the paper [4] and has the form 


pa 


s(at,a*) = ere (14) 


Now, we determine a single common formula for the sum s(a”, a”) including 
both Formulas (13) and (14) for the sums s(a-, a~) and s(a’; ar), As regards the 
second summands in Formulas (13) and (14), which are the generalized harmonic 
numbers H(—a,2),a < 0, and H(a — 1,2), a > 0, for negative integer a we can 
write 


1 + sgn(a) 
a=|a| 
2 
and for positive integer a we can write 
1+ sgn(a) 
a—l=|al 5 : 


Hence we get the following common formula for the sum s(a, a) including both 
Formulas (13) and (14) for the sums s(a~, a~) and s(at, at): 


2, 
s(a,a) = ~ + sgn(a) - alia nd 2. 


2 
If we once again denote 


oe eo, 


we get the following result. 
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Theorem 1 The sum s(a, a) of the series 


> sat: 
(k — a)? 
k=1 
kA#a 


of reciprocals of the quadratic polynomials with double positive or double negative 
integer root a has the form 


2 
s(a,a) = - + sgn(a) - H{lal — o(a), 21, (15) 
where 
ae rane) 


and H(n, 2) is the generalized harmonic number of order n in power 2. 


5 The Sum of the Series of Reciprocals of the Quadratic 
Polynomials with One Negative and One Positive Integer 
Root 


The sum s(a, b*) of the series 


2 @ = ae b) 


2 


of reciprocals of the quadratic polynomials with integer roots a < 0 and b > 0 was 
derived in the paper [5] and is given by formula 


s(a, b*) = 


(b — a)[H(—a) — H(b—-1)] +1 
(b—a)’ 


: (16) 


which can be write in the form 
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_ H(-a) — H(b- 1) 1 
7 b—a (b—a)’ 


s(a_,b*) (17) 


We compare this formula with Formula (12), which has the form 


H||a| —o(@)]— A[|b| —o()] 4 ala o): 


s(a?,b?) = a bay 


It can be seen that the numerator of the second fraction on the right side of Formula 
(12), where a < O and b > 0, is 


1+sgn(a) 1+sgn() 1-1 141 
b — = =— 1, 
o(a) + 0(b) 5) + 7 > E 5 
so the second fraction on the right side of Formula (12) includes also the case negative 
root a and positive root b. Now, we compare the numerators of the first fractions on 
the right sides of Formulas (12) and (17). Because o(a) = 0 for negative a and 
o(b) = | for positive b, we get an equality between these numerators: 


A[|a| — o(a)]— A[|b| — o(b)] = H(\a| — 0) — H((d| — 1) 
= H(—a) — H(b— 1). 


Altogether we get the following main result of this chapter—a formula for the 
sum of the series of reciprocals of the quadratic polynomials with different non-zero 
integer roots. 


Theorem 2 The sum s(a, b) of the series 


o 1 
» (k — a)(k — b) 
k=1 


k £a,b 


of reciprocals of the quadratic polynomials with different non-zero integer roots a 
and b has the form 


H||a| — o(a)]— H[|b|—o(b)] _ oa) + 0(b) 
b-a (b— a)? 


S(a,b) = : (18) 


where 


7 1+ sgn(n) 


o(n) ; 
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and H(n) is the nth harmonic number. 


Let us note that due to the symmetry of Formula (18) we can conclude that if the 
integer roots a and b have the same or different signs, we do not assume the condition 
a<b. 


Example 2 Using (i) Formulas (15) and (18), 
(ii) Maple commands 
calculate the sums of series 


oo [ee] 
1 1 
a eT ee, 
@ >) —20k+ 100 24 ©) » ke + 5k — 50 
pI Ca 
k #10 k#5 


Solution First, we always make sure that the quadratic trinomial has non-zero integer 
roots. We use the procedure of Example | and Formulas (4) and (5). 


(a) We have p = —20, so p?/4 = 100, and qg = 100. Using Formula (4) for even 
p we get g = 100 = 100 — n’, hence n2 = 0, son = 0 € Np. Therefore the 
equation k* — 20k + 100 = 0 has integer roots. Because k* — 20k + 100 = 
(k — 10), we can instead of the first original series write its equivalent form 


[oe] 


2 & aT 


k #10 


(i) Substituting a = 10 into Formula (15) we get the sum 
#3 
s(10, 10) = S + sgn(10) - H[10 — o (10), 2]. 


Since 0 (10) = [1 + sgn(a)]/2 = | and H(9, 2) = 9778141/6350400, 
then 


s(10, 10) +1-H(9,2) 


x 
6 
= me 4, TIBIA gato 
6 6350400 


(ii) By the basic command in Maple for calculating a sum we obtain 


evalf [7] (sum(1/(k-10)*2,k=1. .9)+sum(1/(k-10)*2, 
k=11..infinity) ); 
3.184703 
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(b) We have p = 5, so (p? — 1)/4 = 6, and g = —50. Using Formula (5) for 
odd p we get g = —50 = 6—n(n +1), hence n(n+ 1) = 56, son = 7. 
Therefore the quadratic equation k? + 5k — 50 = 0 has integer roots. Because 
k? + 5k — 50 = (k + 10)(k — 5), the roots are k; = —10 and ky = 5. Instead 
of the original series we can write its equivalent form 


(i) 


(ii) 


(oe) 


& SES 
ve 


Substituting a = —10, b = 5 into Formula (18) we get the sum 


A[ 


10| — o(—10)] — AT |5| —0(5)]  o (—10) +.0(5) 
51-10) st 10j> 


s(—10, 5) = 


Because o (—10) = [1 + sgn(—10)]/2 = 0, o (5) = [1 + sgn(5)]/2 = 1, 
H (10) = 7381/2520 and H(4) = 25/12, we get 
H(10) — H(4) 1 2131 1 2299 


10,5) = t = t = = 0.060820. 
st ) 15 225. 2520-15 225 37800 


By the basic command in Maple for calculating a sum we obtain 
evalf[7] (sum(1/ ((k+10) * (k-)) ,k=1..4)+sum(1/ ((k+10) * 
(k-5)) ,k=6..infinity) ); 

0.060820 


So we can conclude that both ways of calculating the sums of given series we 
have always received the same result. 


6 Numerical Verification 


We solve two problems—to determine the values of the derived sums 


and 


2 
s(d,a) = a sgn(a) - H[|a| — o(a), 2] 


H{la| — o(a)]— H[|b|-—o()] , o(a) +00) 
b-a (b — a)’ 


S(a,b) = 
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For numerical verification of derived Formula (15) we choose six values a € 
{—10, —5, —1, 1,5, 10} and for numerical verification of derived Formula (18) we 
choose thirty ordered pairs (a, b), where a, b € {—10, —5, —1, 1,5, 10},a # b. We 
use on the one hand an approximative direct evaluation of finite sums 


t 1 : 1 
s(a,a,t) = > ae and s(a,b,t)= 2 (k—a)(k—b)’ 
k=1 
k#a k#a,b 


where t = 10°, using the computer algebra system Maple 16, and on the other hand 
Formula (15) for evaluation the sum s(a, a) and Formula (18) for evaluation the sum 
s(a, b). 

For direct evaluation of the finite sums s(a, a, 10°) and s(a, b, 10°) we use two 
following simple procedures rp2aa and rp2ab and succeeding for statements: 


rp2aa:=proc(a,t) 
local k,saa,sumaa,sia,siga; 
sumaa:=0; sia:= signum(a); siga:=(1+sia) /2; 
saa:=Pi*2/6+sia*harmonic (abs (a) -siga,2) ; 
print ("saa(,",a,")=",evalf£[7] (saa) ); 
for k from 1 to t do 
if k<>a then 
sumaa :=sumaa+t1/ ( (k-a) *2) ; 
end if; 
end do; 
print ("sumaa(",t,")=",eval£[7] (sumaa) ) ; 
print ("diff=",evalf[20] (abs (sumaa-saa) ) ) ; 
end proc: 


for a in [-10,-5,-1,1,5,10] do 
rp2aa (a,1000000) ; 
end do; 
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rp2ab:=proc(a,b,t) 
local k,sab,sumab,siga,sigb,s1,s2; 
sumab:=0; siga:=(1+signum(a))/2; sigb:=(1+signum(b) )/2; 
s1:=(harmonic (abs (a) -siga) -harmonic (abs (b) -sigb) ) / (b-a) ; 
s2:=(sigatsigb) / ((b-a) *2) ; 
sab:=sl+s2; 
print ("sab(",a,b,")=",evalf£[7] (sab) ); 
for k from 1 to t do 

if k<>a then 
if k<>b then 


sumab :=sumab+1/ ((k-a) * (k-b) ) ; 


end if; 


end if; 

end do; 
print ("sumab(",t,")=",eval£[7] (sumab) ) ; 
print ("diff=",evalf£[20] (abs (sumab-sab) ) ) ; 
end proc: 


for a in [-10,-5,-1,1,5,10] do 


for b in [-10,-5,-1,1,5,10] do 


end 
end do; 


if a<>b then 
rp2ab (a,b,1000000) ; 


end if; 
do; 
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Approximate values of the sums s(a, a) obtained by the procedure rp2aa and 
the approximative values of the sums s(a, b) obtained by the procedure rp2ab, 


rounded all to 6 decimals, are written into two following Figs. 3 and 4. 


Let us note that from the table Fig. 4, the symmetry of Formula (18) is evident, 
as was mentioned Theorem 2, and that both results s(10, 10) = 3.184703 and 
s(—10, 5) = 0.060820 obtained in Example 2 are practically identical to the results 
obtained using procedures rp2aa and rp2ab. Computation of 6 couples of the sums 
s(a, a, 10°) and s(a, a) and 30 couples of the sums s(a, b, 10°) and s(a, b) took in 
|s(a, a)— s(a, a, 10°)| 


total almost 10 h. The absolute errors, i.e. the differences 
and |s(a, b) — s(a, b, 10°) 


, are all about 107°. 


a=-10 


a=-5 


a=-1 


a=1 


a=5 


a=10 


S(a,a) 


0.095167 


0.181324 


0.644933 


1.644935 


3.068546 


3.184703 


Fig. 3. Some approximate values of the sum s(a, a). Source Own calculation in Maple 16 
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s(a,b) | a=-10 a——5> a1 a=1 a=5 a=10 
b=—10 x 0.129127 | 0.214330 | 0.274534 | 0.060820 | 0.007500 

=-5 | 0.129127 x 0320833 0.408333 | 0.030000 | —0.031931 
=—1 | 0.214330 | 0.320833 x 0.750000 | —0.152778 | —0.158005 
b=1 | 0.274534 | 0.408333 | 0.750000 x —0.395833 | —0.289638 
b=5 | 0.060820 | 0.030000 | —0.152778 | —0.395833 x —0.069127 

= 10 | 0.007500 | —0.031931 | —0.158005 | —0.289638 | —0.069127 x 


Fig. 4 Some approximate values of the sum s(a, b). Source Own calculation in Maple 16 


7 Conclusion 


This chapter dealt with the sum of the series of reciprocals of the quadratic poly- 
nomials with non-zero integer roots, i.e. with the sums s(a,a) and s(a, b) of the 
series 


1 
2 (k—ayk—b)’ 


_ — 


oe k £a,b 


respectively. First, we derived the conditions for the coefficients p, g of a reduced 
quadratic equation so that the equation x7 + px + q = 0 has only integer roots. 
These conditions have the form 


2 2 
-—1 
g=-nneENo and go —n(n+1),n €No 


for even p and odd p, respectively. 
Then the formula 


m2 
SG G) = sr Senta) H||a| — o(a), 2] 


for the sum s(a, a) of the first series above, where o (a) = [1 + sgn(a)| /2 and where 
H(n, 2) is the generalized harmonic number of order n in power 2, was derived. 
Further, the formula 


H||a| — o(@)] — Ald 


b-—a 


o(a)+oa(b) 
(b— a) 


CH= o(b)] FS 
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for the sum s(a, b), a € b, of the second series above, where a(n) = [1 + sgn(n) | /2 
and where H (n) is the nth harmonic number, was derived. 

These two main results were verified by some numerical examples using the basic 
programming language of the computer algebra system Maple. The series we dealt 
with so belong to special types of infinite series which sums are given analytically 
by relatively simple formulas. 
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Life Expectancy at Birth and Its M®) 
Socioeconomic Determinants: siete 
An Application of Random Forest 

Algorithm 


Pedro Antonio Martin Cervantes, Nuria Rueda Lopez, 
and Salvador Cruz Rambaud 


Abstract The objective of this chapter is the analysis of the socioeconomic determi- 
nants of life expectancy at birth in European countries from 2000 to 2017 by using not 
only “traditional” variables such as per capita income and education but also socio- 
cultural differences and public expenditure on social protection. The methodology 
employed in this chapter is Random Forest algorithm to measure the importance of 
certain variables which are related to life expectancy at birth. Our findings conclude 
that the variables with the greatest relative importance in explaining life expectancy 
at birth are per capita income, the variable “Area” (representative of the sociocul- 
tural differences between Eastern and the rest of European countries), the educational 
level of the population, and public expenditure on social protection. Contrarily, the 
least important variables are inflation, public environmental expenditure and unem- 
ployment. These results highlight the importance of public expenditure on social 
protection in the composition of public budget in order to reach health outcomes. 


Keywords Life expectancy at birth - Relative importance - Random forest 
algorithm - Socioeconomic determinants - Health outcomes 


1 Introduction 


Traditionally, policy makers in developed and developing countries have been inter- 
ested in improving, or at least maintaining, population health. The extent to which 
socioeconomic factors can affect health status of citizens is a topic which has recently 
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received lot of attention, especially in developed countries due to the economic reces- 
sion [10]. In this way, the recession of 2008 has generated large cuts in public expen- 
ditures, mainly healthcare and social expenditures, which can affect the population 
health [80]. 

Several studies have investigated the socioeconomic determinants of health 
outcomes from an individual perspective based on self-rated health data. These 
researches have found that there are some differences in the relationship between 
socioeconomic variables and individuals’ subjective general health status among 
countries [1, 4, 23, 66]. 

We consider interesting to follow a macro level analysis because it will allow to 
analyze the socioeconomic determinants of health outcomes at a country level and to 
obtain beneficial economic policy recommendations. Most previous literature based 
on this macro-data approach has been referred to socioeconomic factors of developed 
countries such as OECD’s. However, to a lesser extent, the analysis is referred to a 
group of European countries [10, 32, 54, 59, 65, 88]. 

The present study is aimed at identifying and classifying the relative importance 
of the socioeconomic factors in explaining life expectancy at birth in European coun- 
tries throughout the period 2000-2017. For this purpose and according to previous 
studies, “traditional” macroeconomic indicators are included in the analysis such as 
per capita income, unemployment and inflation, besides education of the population. 
Additionally, we are going to consider public health expenditure and some compo- 
nents of the public budget related to healthcare, recently included in this type of 
research, such as social and educational expenditures [69, 81, 87] and environmental 
protection expenditures. 

A brief literature revision shows that econometric models with panel data predom- 
inate in this field of research focused on the socioeconomic determinants of health 
outcomes; among them, we can highlight the works by Hitiris and Posnett [40], 
Grubaugh and Rexford [35], Crémieux et al. [24], Or [67, 68], Berger and Messer 
[12], Crémieux et al. [25], Tapia-Granados [82], Nixon and Ullmann [65], Shin-Jong 
[78], Bergh and Nilson [13], Heijink et al. [38], Reynolds and Avendano [72], and 
Hill and Jorgenson [39]. 

However, in this work we will apply a Random Forests methodology based on tree- 
structured predictors, which allows classifying the socioeconomic variables included 
in the analysis according to their relative importance to explain the behaviour of life 
expectancy at birth. 

The present study is useful in three ways. First, it extends previous literature about 
the socioeconomic determinants of health focused on European countries, which is 
less extensive so far. Second, this research offers a classification of the socioeconomic 
factors according to their relative importance to explain life expectancy at birth. Third, 
we employ a Random Forests methodology which, to the extent of our knowledge, 
has not been applied before to study the variables related to health outcome in Europe. 

Finally, the structure of this work is as follows. In Sect. 2, we include a review of 
the main empirical studies published to date about the socioeconomic factors which 
can explain health outcome in the context of developed countries. In Sect. 3, we 
present the Random Forests methodology. Next, in Sect. 4, we offer the main results 
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of the implemented empirical analysis. Finally, Sect. 5 presents the discussion and 
Sect. 6 highlights the main conclusions of this chapter. 


2 Literature Review 


Previous literature on the determinants of health status referred to developed coun- 
tries, considers that this variable is determined by a combination of not only socio- 
economic factors but also health care resources in monetary [6, 24, 36, 40, 43, 65, 
84], and [43] or physical terms [22, 24, 57, 68, 88], and lifestyle factors [22, 36, 38, 
61, 77]. This type of studies follows a macro level approach whose major advantage 
is that it allows to analyze the total effect of different factors on the health status of 
citizens and, therefore, to obtain suitable economic policy implications. 

Paying attention to socioeconomic factors, most previous researches at a macro 
level and referred to developed countries have considered gross domestic product or 
per capita income [6, 44, 54, 59, 67] and, with less frequency, some indicators of 
income distribution [12, 39, 74] or poverty [24]. Education is another socioeconomic 
factor, frequently included in empirical works, which can determine health [28, 46, 
76], as well as the level of occupation [6, 67, 68]. 

Additional macroeconomic variables employed as determinants of health status 
are unemployment [10, 48, 78, 82], inflation [7] and gross capital formation [62]. 

Within the group of socioeconomic factors, some works include demographic 
variables such as older population [40, 92] or variables of institutional nature such 
as the type of health system [32, 35], the level of fiscal decentralization, corruption 
and political rights [73] or the welfare state regime [45]. 

In this type of literature it is usual to include other specific variables such as 
pollution [2, 62, 65]; however, it is less frequent to consider an environmental 
quality indicator [56, 86], a measure of globalization [13] or some indicators of 
social development [43] as a possible determinant of health outcomes. 

It should be noted that, recently, there has been a growing interest in including 
new categories of health-related expenditures which can be considered as socioeco- 
nomic factors. This is the case of social expenditures [3, 11, 14, 15, 45, 60, 80, 89], 
educational expenditures [72, 88], and environmental expenditures [58]. 

Due to the fact that specialized empirical studies in this field have considered the 
health care expenditures financed by the public sector as a resource factor which 
can impact health status [12, 22, 32, 50, 51, 67, 68, 70, 76, 85], we are interested in 
extending the public expenditures included in this type of research. For this reason, 
we are going to consider not only several macroeconomic indicators but the public 
budget components related to health of the population. 

Although previous studies have provided valuable relationships between socioe- 
conomic variables and life expectancy at birth, their results do not offer a classifica- 
tion of these variables by using their relative importance when explaining the health 
outcome as classification criteria. This is the reason whereby a Random Forests- 
based methodology is employed in this work in order to analyze life expectancy 
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at birth and classify its influencing socioeconomic factors in twenty-five European 
countries from 2000 to 2017. 


3 Method 


The basic approach related to the elaboration of multiple decision trees in a repet- 
itive way is earlier than Breiman [17, 18], since the essence of this methodology 
can already be found in Williams’ MIL (Multiple Inductive Learning) algorithm 
[90, 91], leading to the construction of several decision trees combined in a single 
model in which their predictive power was improved, by selecting different variables 
attributable to each node. For his part, Ho [41, 42] developed the intuitive concept 
of “random sampling of variables” applied to the decision trees, from which a set of 
decision trees was constructed in which a half of the selected variables were obtained 
randomly, by reaching a very important conclusion regarding this methodology: as 
a greater number of trees were added to the already created set, the performance of 
the model predictions also increased (in most cases, in a monotonic way). 

Later, Breiman [16] introduced the “bagging” methodology (very similar to 
Random Forests) in which different subsets of an initial dataset are randomly selected 
as learning data according to which each tree will be developed. Finally, Breiman 
[17, 18] defined the Random Forest algorithm (or Breiman-Cutler algorithm), an 
alternative to the predominant “boosting” models [34], based on the “bagging” (of 
which it would be an extension) and on the contributions by Amit and Geman [5] 
(see [29]) who, focused on the recognition of written characters, made important 
advances in the study of a wide range of geometric characteristics, further advancing 
in the random selection of the best subdivision corresponding to each node. 

Random Forests, whose original denomination is actually “Breiman-Cutler Algo- 
rithm’, is a type of machine learning algorithm which categorizes something starting 
from the data at our disposal [37]. A Random Forest is “a combination of tree 
predictors such that each tree depends on the values of a random vector sampled 
independently and with the same distribution for all trees in the forest” [17]. The 
idea underlying to this classifier is to increase the number of trees and then to vote 
for the most popular class. 

Computing internal estimates determine the accuracy of a Random Forest by 
means of the generalization error which depends on the strength of individual trees 
and the dependence (correlation) between them. The Strong Law of Large Numbers 
shows that the generalization error of a Random Forest converges almost surely as the 
number of trees is large enough. There are several features at each node to determine 
the split. Finally, internal estimates determine the importance of each variable. 

Random Forests have been applied to current important problems (see, for 
example, [49]). In this paper, we are going to measure the importance of certain 
variables which determine life expectancy at birth. 

Bagging and Random Forest follow more or less the same algorithm by averaging 
a set of models whereby they are able to reduce the variance of predictors. However, 
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bagging is very sensitive to an influential predictor, generating very similar trees in 
this case, which does not reduce the variance. Alternatively, Random Forest generates 
a random extraction of m; predictors and, unlike bagging, an average of (p — m;)/p 
divisions are obtained, where p = )~ m;. This characteristic avoids the preponder- 
ance of a certain predictor or variable, reducing the correlation between predictors, 
and establish the main difference between both learning-based methods: in bagging 
all m; features are considered for each node for a split (or subdivision), whilst in the 
Random Forest algorithm, only m; < p. 

The algorithm of Random Forest, derived from bagging predictors [16], was 
described by Cutler et al. [27] in the following way: 


1. We start from a p-dimensional random vector X = (Xj, X2,...,X a which 
represents the predictor variables and a random variable Y (called the real-valued 
response). 


2. An unknown joint distribution, Pyy(X, Y), is assumed. 

3. The aim of this algorithm is to find a prediction function f (X) for predicting Y. To 
do this, we minimize the expected value Eyy(L(Y, f(X))), where L(Y, f(X)) 
is the so-called loss function. 

4. L(Y, f(X)) is an indicator of how close f(X) is to Y. In this paper, we have 
chosen the zero—one loss function, defined as: 


0, ifY = f(X) 
1, otherwise 


LY, f(X) =1Y € f(X) = 


5. In this case, by denoting W the set of possible values of Y, minimizing 
Exyy(L(Y, f (X))) gives: 


f(x) = arg max P(Y = y|X = x) 
yey 


6. For the kth tree (k = 1,2,..., K), we are going to generate a random vector 
©,, independent of the former random vectors ©;, Qo, ..., Qx_; but identically 
distributed, which results in a classifier, denoted by h(x, ©,), where x is an input 
vector. 

7. f(x) is the most frequently predicted class: 


K 
f(x) = argmax Y 1 T(y = h(x, ©) 


yew pe] 
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4 Data 


4.1 Description and Contextualization of Data 


In order to make an analysis as robust as possible, we have implemented the Breiman- 
Cutler Algorithm to a database with a structure and arrangement very similar to the 
Iris database [33] which is one of the classic prototypes of the multivariate research. 

More specifically, and reiterating the different nature of the data on which our anal- 
ysis is based, as summarized in Table |, the values of the variable “Life expectancy 
variable at birth” of 25 European countries have been collected throughout the 2000- 
2017 period, in conjunction with 8 variables with which a certain previous relation- 
ship is assumed and which our study aims to determine: public expenditure (on 
education, health, environmental protection and social protection), in addition to per 
capita income, educational level, inflation and unemployment of each considered 
country from the Eurostat data base. In summary, the database consists of 4,050 data 
(450 related to the life expectancy at birth variable and 3,600 corresponding to the 
aforementioned variables). 


Table 1 Definition of the LE Life expectancy at birth 
selected variables 
EDU Education! 
ENVIRO Environmental protection” 
HEALTH Health? 
INCO Gross National Income* 
LEDU Educational level? 
PI Inflation® 
SOPRO Social protection” 
UNEMP Unemployment® 


‘Total general government expenditure on education, percentage 
of Gross Domestic Product (GDP) 

Total general government expenditure on environmental 
protection, percentage of Gross Domestic Product (GDP) 

Total general government expenditure on health, percentage of 
Gross Domestic Product (GDP) 

4Gross national income at current prices per capita (in thousands 
euro) 

Percentage of population with upper secondary, post-secondary 
non-tertiary and tertiary education: levels 3-8) 

®Inflation rate, calculated as the harmonized index of consumer 
prices annual rate 

’Total general government expenditure on social protection, 
percentage of Gross Domestic Product (GDP) 

5Unemployment, percentage of active population 

Source Own elaboration 
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In a subsequent step, given that at least one of the variables integrated in the 
database has to be categorical and that our objective is to derive additive relationships 
between the variables [17], the variable “life expectancy at birth” has been categorized 
according to a specific procedure which will be indicated later. In the same way, we 
have considered that a possible high-very high correlation between the eight variables 
could result in multicollinearity making non significant the results of this analysis. 
However, the presence of multicollinearity between the variables has been dismissed 
according to the conclusions extracted from Fig. 6 (see Appendix). 

The heterogeneity of variables included in the database requires a relatively in- 
depth exploratory analysis which shows, in brief traces, an initial evolution of life 
expectancy at birth and the rest of involved variables. Focusing exclusively on the 
evolution of life expectancy at birth during the analyzed period, the following inter- 
vals have been established, which include the relative frequencies of the variable for 
the total number of considered countries: 


70.70 < LE < 72.83 
72.83 < LE < 74.97 
74.97 < LE < 77.10 
77.10 < LE < 79.23 
79.23 < LE < 81.37 
81.37 < LE < 83.50 


2S oe 


In relation to such intervals, Fig. 1 shows an Area chart in which a plausible self- 
sustained percentage growth of the variable “life expectancy at birth” is exhibited, 
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Fig. 1 Evolution of the relative frequencies of the variable “life Expectancy at birth”, according to 
the six aforementioned intervals. Source Own elaboration 
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especially in the upper intervals (D, E and F), equivalent to cumulative relative 
frequencies for 2017 of almost 75%. In other words and according to the former 
intervals, the countries analyzed at the end of the period show a joint life expectancy 
at birth greater than 77.1 years in terms of relative frequencies. 

Likewise, before starting the implementation of a Random Forests classification 
approach which shows the relationship between “life expectancy at birth” and the rest 
of analyzed variables, we have considered it mandatory to determine if the involved 
countries present certain common patterns which could be aggregated into significant 
hierarchical clusters. This would help us to contextualize this variable according to 
each country. In this way, a three-level circular dendrogram is attached in Fig. 2, 
constructed from the average life expectancy at birth by country between 2000 and 
2017. 

Figure 2 shows two large groups or clusters of nations in relation to life expectancy 
at birth: those European countries of the former “Eastern” area which, until the late 
80 s of the last century had presented planned economies, and the “Western” area. 
Nevertheless, this last group of nations cannot be considered as a homogeneous 
cluster per se, since also in some of these countries there was a non-negligible state 
intervention in their economies. 
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Fig. 2. Circular dendrogram of the involved countries, elaborated in function of the average life 
expectancy at birth for the analyzed period (2000-2017). Source Own elaboration 
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Such disparity of the values of life expectancy at birth and the rest of the analyzed 
variables among the countries belonging or not to the “Western” area (A) or to the 
“Eastern” area (B) is obviously visible in Table 2 of descriptive statistics. Observe that 
there is a significant gap between the two areas but also that there exists a convergence 
of life expectancy at birth of all countries. Therefore, the fact that a country has 
belonged to the former Eastern area does not seem sufficient reason or argument to 
systematically determine a reduced life expectancy at birth, and reciprocally. This 
reasoning can be extended to other variables included in our database, especially the 
per capita income, today higher in some Eastern countries with respect to several 
countries of the Western Europe. 

This trend towards a gradual convergence of life expectancy at birth in both areas 
can be seen in Table 4 (see Appendix), in which precisely the countries with the 
highest growth of this variable in absolute terms are those coming from the former 
Eastern area, but starting from very small values at the beginning of the period 
(2000), compared with the countries of the Western area. Among these countries, 
“late starters” in a competitive economy or economies still in transition [47], two 
nations must be highlighted: Estonia and Slovenia. In effect, the government efforts 
of Baltic countries have led to a huge increase in life expectancy at birth, specifically 
7.3 years in just 17 years (see Table 4 in Appendix), to such an extent that according 
to Tiit [83], at the end of the century XXI will reach the levels of Finland, a country 
which is geographically and culturally close to Estonia, whose differential a few 
decades ago seemed unattainable. The Slovenian case is also remarkable, describing 
an unusual increase that [52] is largely due to the drastic reduction in the mortality 
rate in the more advanced population groups derived from a substantial improvement 
of preventive medicine in this country. Moreover, we have to add the fact of reaching 
a level of per capita income homologous or even higher than that of some Western 
countries (i.e., Greece and Portugal). 

As formerly indicated, in the same way that a high or reduced life expectancy at 
birth should not be ascribed to any specific geographic area, we do consider relevant 
the fact of discriminating the analyzed countries according to their area of origin. 
Therefore, we have added a new categorical variable to Table 2 which differentiates 
the countries coming from each area. A priori, this variable would behave as a 
“collector” variable which must present non-redundant information, mainly focused 
on indicating the sociocultural differences of both areas after a long period in which 
European counties remained divided [53] according to several indicators, such as 
their different access to basic medicines [47] or also their different exposure to 
pollution. 

In a very schematic way, Fig. 3 represents a flow diagram which abstractly relates 
the categorical variables “Area” (that is to say, Eastern Europe-Western countries) 
and “life expectancy at birth” (Increase—Decrease). In effect, it shows a relatively 
important flow of Eastern European countries with increasing life expectancy variable 
at birth and, conversely, another flow (in the opposite direction) between Western 
countries and a decrease in this variable which reinforces our position to include the 
categorical variable “Area” in the Random Forest algorithm. 
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Fig.3 Alluvial diagram of the categorical variables “Area” (Estern Europe—Western Europe) versus 
life expectancy at birth (Decrease/Increase). Source Own elaboration 


Finally, we have to determine how the variable “life expectancy at birth” has been 
categorized. Obviously, categorizing any variable does not offer many problems; 
however, it is somewhat more meticulous to obtain a categorization which offers a 
global assessment sufficiently representative of its temporal evolution, taking into 
account each of the 25 countries included in the database. For this purpose, we have 
opted for categorizing this variable according to the behaviour of its average in each 
country, based on the following dichotomous relationship: 


If LE(Country;/Year ; ) >Average(L E /Year;—090, ...,2017) = Life Expectancy increasing = Yes 


If LE(Country;/Year ;) < Average(L E/Yearj—2000, ... 2017) = Life Expectancy decreasing = No 


4.2 Random Forests Implementation 


The application of the classification mode of the Breiman-Cutler algorithm allows 
us to dissect to what extent the variables included in the model have a greater or 
lesser relative importance on the target variable (LE). This has led us to obtain the 
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following specifications, summarized in Table 3: 315 observations have been used 
to construct the model, and 6 variables in each subdivision (or split), resulting in 
10,000 random (decision) trees. 

The estimate of the “out-of-bag” (OOB) error rate is calculated by using the 
observations which are not present in the “bag” or subset of the learning data set used 
in the iterative construction of each tree of decision: this unbiased error explains that, 
once the model is built and new observations have been incorporated, the answers 
will specifically result in this error. With respect to the generated model, the estimated 
OOB error is certainly insignificant, 0.5%, so we can deduce that the generated model 
is “reasonably good” according to the argot usually used in the application of this 
algorithm, obtaining a really high accuracy, 99.5%. Usually, increasing the number 
of random trees entails a decrease in the OOB error; however, Fig. 4 shows that due to 


Table 3 Random forests: model diagnosis 


Summary of the achieved Random Forests model results implemented in the variable “life 
expectancy at birth” 


Number of observations 315 
used to build the model: 


Type of Random Forests: Classification 
Number of trees: 10.000 

No. of variables tried at 6 

each split: 


OOB estimate of error rate: | 0.5% 


Confusion matrix 


NO YES class.error 
NO 106 1 0.009345794 
YES 2 206 0.009615385 


Area under the curve: 95% CI: 0.9792-1 (see [30]) 


Variable importance 


No Yes MeanDecreaseAccuracy | MeanDecreaseGini 
INCO 230.00 | 58.81 183.60 29.88 
AREA 182.30 | 76.79 179.14 52.31 
LEDU 51.53 53.20 58.91 4.53 
SOPRO 38.12 22.41 41.70 1.16 
EDU 24.16 18.58 28.74 0.54 
HEALTH 11.23 22.67 20.66 0.34 
PI 20.79 8.84 19.08 0.53 
ENVIRO 11.15 2.84 11.08 0.15 
UNEMP 3.92 2.12 4.55 0.15 


Source Own elaboration 
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Fig. 4 OOB errors for Random forests. Source Own elaboration 


the robustness of the constructed model, this rate remains constant from a relatively 
small number of random trees (around 400). 

The confusion matrix compiles the degree of disagreement between the predic- 
tions obtained by the constructed model and the data generated by the learning 
(or training) observations (315): the actual observations correspond to the rows in 
the table, and the columns reference the prediction of the model in each observa- 
tion, whereby the cells determine the number of observations in each category from 
which the model has been built. Again, according to the obtained model, an almost 
total agreement is obtained between the jointly predicted by the model and learning 
data: both predict “Yes” (increase in life expectancy at birth) in 206 cases and “No” 
(decrease in life expectancy at birth) by 106, showing a very low level of disparity 
or disagreement (no-yes = 0.00934% / yes—no = 0.0096%). 

In order to diagnose the reliability of our model, we have used the procedure of 
the “Area under the ROC (Receiver Operating Characteristic) curve” based on the De 
Long test [30], to avoid starting from excessively restrictive assumptions of normality 
as is often the case with other commonly used tests (i.e., the binormal test): the result, 
quite close to | is, in principle, indicative of an optimal diagnosis, which determines 
a high reliability towards the generated Random Forests model. The interpretation of 
a single decision tree is usually trivial; however, in a Random Forests model (from 
here its versatility in the classification of variables), the relative importance of each 
variable must be taken into account, mainly, according to two criteria: Mean Decrease 
Accuracy (MDA) and Mean Decrease Gini (MDG), from which an order of priority 
can be established taking into account the relative importance, as detailed in Fig. 4. 
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Fig. 5 Relative importance of the variables implemented in the Random Forests. Source Own 
elaboration 


Finally, the relative importance of each variable included in the constructed 
Random Forests model, graphically described in Fig. 5, could be summarized in 
the following orders, where it can be seen that both criteria are practically coincident 
and that, in both cases, the first four variables have a relative importance substantially 
greater than the rest (see Table 3): 

MDA: INCO (1), AREA (2), LEDU (3), SOPRO (4), EDU (5), HEALTH (6), PI 
(7), ENVIRO (8), UNEMP (9). 

MDG: AREA (1), INCO (2), LEDU (3), SOPRO (4), EDU (5), PI (6), HEALTH 
(7), ENVIRO (8), UNEMP (9). 


5 Discussion 


Our results conclude that, regardless of the criteria used (MDA or MDG), the vari- 
ables with the greatest relative importance in explaining life expectancy at birth are 
classified in a similar way. In consequence, per capita income, the variable “area” 
(representative of the sociocultural differences between Eastern and the rest of Euro- 
pean countries), the educational level of the population, and the public expenditure 
on social protection are the variables with the highest relevance, followed by public 
expenditure on education and health. On the contrary, the least important variables 
are inflation, public environmental expenditure and unemployment. 
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With respect to per capita income, our findings are consistent with previous studies 
being the major determinant of the population health status [44, 55]. In effect, higher 
per capita income affects health status improving nutrition, access to health care and 
working conditions. In some studies, the analysis has focused on GDP per capita and 
these results have not been altered [7]. Most of the empirical works has concluded 
that higher income is associated with greater longevity [45, 19]. In this way, these 
results confirm the Preston [71] curve which relates national income to average life 
expectancy at birth for a range of countries at one point in time and shows that people 
living in rich countries on average live longer than people in poor countries. However, 
this positive association should be interpreted with caution because some countries 
with high life expectancy at birth are not developed countries [75]. 

Sociocultural differences between Eastern and the rest of European countries seem 
to have a relevant role in explaining health outcomes. One of the sociocultural factor 
extensively studied is the welfare regime [4, 8, 20, 21, 31]. In this way, Navarro and 
Shi [64] and Navarro et al. [63] study the differences between four welfare state 
regimes (grouped in terms of political traditions), where countries which have had 
long periods of government with redistributive political parties (traditionally, social 
democratic countries) were found to have lower infant mortality rates and, to a lesser 
extent, increased life expectancy. 

Education attainment of population is one of the socioeconomic factors included 
in the empirical works in this field. Better education is associated with healthier popu- 
lation because people with high education are more likely to have better jobs, higher 
incomes and promote healthier behavioral choices [26]. Our findings reveal a great 
importance in explaining life expectancy at birth and coincide with previous empir- 
ical research suggesting a strong relation between education and health. Reynolds 
and Avendano [72] and Ketency and Murthy [46] point out that more schooling and 
higher educational attainment is associated with better health. In addition, the rele- 
vance of education can be more intense depending on the type of country. At this 
point, Alvarez-Galvez et al. [4], by using individual data, support that the importance 
of education-related inequalities on self-rated health surpasses differences in income 
and occupational status, especially in Southern- and Eastern-European countries. 

Public expenditure on social protection is included in the group of the most impor- 
tant socioeconomic factors related to health status in our results. Previous studies have 
confirmed the positive effect of social expenditure on health outcomes [14, 58, 60, 80, 
89], with the exception of Karim et al. [45] and Alvarez-Galvez and Jaime-Castillo 
[3], among others. 

Additional components of public budget considered in this work, such as educa- 
tional and health expenditures, have registered less relative relevance in the model. 
Although education investment contributes to gains in life expectancy [72, 88], our 
results seem to show that this ambit of government expenditures has less influence that 
public expenditure on social protection. On the other hand, in international studies 
there is no consensus when identifying the contribution of healthcare expenditures. 
In the case of public expenditures, some works reveal a positive significant contribu- 
tion on health outcomes [32, 50, 51, 68, 85], although in some cases no significant 
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impact is detected [76] and, in other researches, this effect is moderately adverse 
[12, 70]. 

The least important socioeconomic factors related to life expectancy in our work 
seem to be inflation, public environmental expenditure and unemployment. However, 
Monsef and Mehrjardi’s [62] results suggest that inflation is one of the main economic 
factors which negatively influence life expectancy. In this line, Bai et al. [7] find that 
inflation is negatively associated with life expectancy for the Belt and Road countries, 
in the bottom of life expectancy quantiles for men. When prices rise, people suffer 
a reduction of the proportion of goods and services which can consume such us 
medical treatments and drugs. So inflation can harm the health but it does not mean 
that lower inflation is positive for health. 

Our research indicates that unemployment is not an important factor to explain life 
expectancy. However, previous researchers have concluded that this factor is strongly 
related with health status. On the one hand, Bai et al. [7] and Bartoll and Mari- 
Dell’Olmo [10] have documented a positive association between unemployment 
and health outcomes in countries with higher life expectancy. On the other hand, a 
negative relationship is identified [9, 62, 79] because of an increase in the prevalence 
of psychological distress, chronic diseases and functional limitations, among others. 

The least relative importance variable corresponds to public environmental protec- 
tion expenditure in our analysis, confirming Martin-Cervantes’ et al. [58] results. In 
this case, it must be highlighted that there is very little empirical evidence on the 
relationship between life expectancy and environmental expenditure. Consequently, 
more research must be devoted to this instrument of environmental protection policy. 


6 Conclusions 


Employing Random Forests methodology, this work has classified the socioeco- 
nomic factors according to their relative importance on life expectancy at birth in 
European countries from 2000 to 2017. The main political recommendation is that 
European strategies intended to impact on health outcome should focus not only on 
improvements in “traditional” variables such as per capita income and education but 
on the sociocultural context (mainly in Eastern countries) and public expenditure 
on social protection. It is obvious that more education is needed to know healthy 
habits and social behaviors. On the other hand, higher income allows a better access 
to medicines, hospitals and healthier lifestyles. In addition, it is expected that the 
reform of different cultural or institutional factors, such as welfare regime, can influ- 
ence life expectancy. So, the policy makers have the responsibility of design these 
changes from a public health perspective. Finally, more attention has to be spent 
on the composition on public budget, since unlike what might be expected public 
expenditures on social protection seem to be more relevant for health outcomes than 
healthcare expenditures. 
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A limitation of this study is that lifestyle factors as smoking, obesity and drinking, 
among others, have not been considered because they are not available for a consis- 
tent period. Future research should include this type of information and expand the 
objective of this study in order to compare specific differences in the relevance of 
socioeconomic variables on health status among European countries. 


Appendix 


See Table 4 and Fig. 6 


Table 4 Life expectancy at birth performance along the analyzed period 


Country 


Belgium 


Life expectancy 
(2000) 


77.9 (QO) 


Life expectancy 
(2017) 


81.6 (O) 


Period lag 


3.7 (U) 


2 years 
consecutive lag 
average 


0.22 years 
(2 months and 
19 days) (U) 


Bulgaria 


71.6 (U) 


74.8 (U) 


3.2 (U) 


0.19 years 
(2 months and 
8 days) (U) 


Cyprus 


77.1 (O) 


82.2 (O) 


4.5 (O) 


0.26 years 
(3 months and 
6 days) (O) 


Czechia 


75.1 (U) 


79.1 (U) 


4(U) 


0.24 years 
(2 months and 
25 days) (U) 


Denmark 


76.9 (O) 


81.1 (O) 


4.2 (O) 


0.25 years 
(3 months and 
0 days) (O) 


Estonia 


71.1 (U) 


78.4 (U) 


7.3 (O) 


0.43 years 
(5 months and 
6 days) (O) 


Finland 


77.8 (OQ) 


81.7 (O) 


3.9 (U) 


0.23 years 
(2 months and 
23 days) (U) 


France 


Germany 


79.2 (O) 


78.3 (O) 


82.7 (O) 


81.1 (O) 


3.5 (U) 


2.8 (U) 


0.21 years 
(2 months and 
15 days) (U) 


0.16 years 
(2 months and 
0 days) (U) 


(continued) 
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Country 


Greece 


Life expectancy 


(2000) 


78.6 (O) 


Life expectancy 
(2017) 


81.4 (O) 


Period lag 


2.8 (U) 


2 years 
consecutive lag 
average 


0.16 years 
(2 months and 
0 days) (U) 


Hungary 


Ireland 


71.9 (U) 


76.6 (O) 


76 (U) 


82.2 (O) 


4.1 (O) 


5.6 (O) 


0.24 years 
(2 months and 
28 days) (O) 


0.33 years 
(4 months and 
0 days) (O) 


Italy 


79.9 (O) 


83.1 (O) 


3.2 (U) 


0.19 years 
(2 months and 
8 days) (U) 


Lithuania 


72.1 (U) 


75.8 (U) 


3.7 (U) 


0.22 years 
(2 months and 
19 days) (U) 


Luxembourg 


78 (O) 


82.1 (O) 


4.1 (O) 


0.24 years 
(2 months and 
28 days) (O) 


Malta 


78.5 (O) 


82.4 (O) 


3.9 (U) 


0.23 years 
(2 months and 
23 days) (U) 


Netherlands 


78.2 (O) 


81.8 (O) 


3.6 (U) 


0.21 years 
(2 months and 
17 days) (U) 


Poland 


73.8 (U) 


77.8 (U) 


4(U) 


0.24 years 
(2 months and 
25 days) (U) 


Portugal 


76.8 (O) 


81.6 (O) 


4.8 (O) 


0.28 years 
(3 months and 
13 days) (O) 


Romania 


71.2 (U) 


75.3 (U) 


4.1 (O) 


0.24 years 
(2 months and 
28 days) (O) 


Slovakia 


73.3 (U) 


771.3 (U) 


4(U) 


0.24 years 
(2 months and 
25 days) (U) 


Slovenia 


Spain 


76.2 (U) 


79.3 (O) 


81.2 (O) 


83.4 (O) 


5 (O) 


4.1 (O) 


0.29 years 
(3 months and 
17 days) (O) 


0.24 years 
(2 months and 
28 days) (O) 


(continued) 
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Table 4 (continued) 


Country Life expectancy Life expectancy Period lag 2 years 
(2000) (2017) consecutive lag 
average 
Sweden 79.8 (O) 82.5 (O) 2.7 (U) 0.16 years 


(1 months and 
27 days) (U) 


United Kingdom | 78 (O) 81.3 (O) 3.3 (U) 0.19 years 
(2 months and 
10 days) (U) 
Average 76.31 80.32 4.00 0.24 (2 months 


and 25 days) 


O: Over the average 
U: Under the average 


Source Own elaboration 
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Fig. 6 Variables correlation plot Source Own elaboration 
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The Role of the Communication ®) 
and Information in Decision-Making cae 
Problems 


Viviana Ventre, Angelarosa Longo, and Fabrizio Maturo 


Abstract Rational models in decision processes fail because of cognitive and moti- 
vational biases that produce different effects mainly linked to mental models of each 
person. We point out the importance of the level of information of each problem solver 
in causing inconsistent preferences in a decision process both in cooperative decision 
problems and in non-cooperative ones. We demonstrate that only in particular cases 
the optimum and rational decision matches with the real one influenced by different 
kind of emotions. Furthermore, behavioural biases influence also the modeller, so 
how the decision model is constructed and presented is different depending on the 
modellers, and these differences also can influence the final choice of each agent. In 
the end, we can obtain different results with the same model. 


Keywords Behavioral operational research - Intertemporal choice + Information in 
decision processes 


1 Introduction 


Several studies point out anomalies that violate some axioms in the traditional 
Discounted Utility (DU) model derived from the effects of emotions on behaviour 
[1-4]. Some researches show positive effects of emotions that can be modelled with 
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hyperbolic delay discounting, whereas other studies show negative effects mainly 
manifested by the impulsivity. 

Mental models of each person influence emotions (positives and negatives) and 
impulsivity and, in turn, mental models are influenced by the level of information held 
by the problem solver. Based on Thaler and Shefrin’s model, we demonstrate that in a 
multi-agent decision context, information about others and about the problem influ- 
ences the final choice [5, 6]. We do a demonstration with examples of the two opposite 
ends: (1) lack of information, and (2) excess of information, in which the divergence 
from traditional models is generated by effects due to emotions (false consensus and 
overconfidence). If there is no information, as happens in non-cooperative decision 
problems, people consider their own decisions more important than those of other 
people from the same population (false consensus effect). Instead, if all information 
is explicit, people will consider the choices of other people as more informative then 
their own (excess of consensus or overconfidence). An example is a cooperative deci- 
sion problem. In this case, the final decision is also influenced by how information 
is passed: misunderstandings and manipulations change people’s reactions. 

Creating a model to predict a decision process is itself a decision process, and also 
the modellers are influenced by their mental models. Therefore, different modellers 
could come up with different results, even when using the same modelling tools. 
Communication in modelling has a vital role because a different way to show and 
discuss the data can lead to distinct behavioural consequences, especially if stake- 
holders have dissimilar cultural and educational backgrounds. Also, misunderstand- 
ings of phenomena can depend on the way the phenomena are described. Modellers 
can influence the results of a decision process, only acting on the presentation of the 
problem. We demonstrate this with an example in non-cooperative decision prob- 
lems where there is the lack of information, and thus the mental models of each 
decision-maker can be influenced only by priming and framing effects linked to the 
construction and presentation of the model itself. 


2 Behavioural Effect on Intertemporal Discounting 


2.1 Standard Discount Model 


In intertemporal choice, human behaviour has modelled by Operational Research 
(OR) in terms of DU model, according to which a person’s intertemporal preference 
is time consistent. 

However, Decision Neuroscience shows that there are several behaviour patterns 
(cognitive and motivational biases) that violate rational choice theory, so the tradi- 
tional discounting model [7]. They may significantly change the decision analysis 
generating inconsistent preferences in intertemporal choices (see also [8]). 

One of the most critical anomalies, which contradict discounted utility, is delay 
effect: the discount rates tend to be higher as waiting time increases. We can set out 
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this effect as follows: 


(x, 5) ~ (y,t) but (x,s +h) <(y,t+h), fory >x,s <tandh>0 


2.2 Positive Effects of Emotions 


Several studies showed that people with emotional dysfunction tend to perform 
poorly compared with those who are endowed with intact emotional processes 
(see e.g. [9—11].) demonstrated that normal people possess anticipatory SCRs (Skin 
Conductance Response) — indices of somatic states — that is unconscious biases linked 
to prior experiences which lead to inconsistent preferences, because they alarm the 
normal individual about selecting a disadvantageous course of action, even if he still 
doesn’t know the goodness or badness of the choice. Therefore, many psychologists 
and economists support the replacement of the notion of exponential discounting 
by some form of a hyperbolic discount model, in which temporal discount function 
increases with the delay to an outcome. So hyperbolic model can represent the delay 
effect, it supports dynamic inconsistency because non-exponential time-preference 
curves can cross [12], so can represent preference reversals for two rewards in a 
different time. One of the many hyperbolic discount functions has the following 


form: 
1 B/a 
d(t) = 
© (=a) 


where £ > 0 is the degree of discounting and a > 0 is the departure from exponential 
discounting. This model can represent a wide range of behaviours [8]. 


2.3 Negative Effects of Emotions 


According to other studies, normal individuals can lose their self-control. This fact 
produces temptation, which, in many cases, induces disadvantageous behaviour [10]. 

Other researches show that, in defined circumstances, negative emotions in normal 
individuals can generate counterproductive effects, even when does not lead to the 
loss of self-control, while individuals deprived of normal emotional reactions can 
make more profitable decisions [13]. 

These inconsistent preferences can be typically seen in psychiatric disorders (alco- 
holism, drug abuse), but also in more ordinary phenomena (overeating, credit card 
debt); addicts are more myopic (have large time-discount rates) in comparison with 
non-addicted populations, they act more impulsively, so their choice is different 
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from their own previously intended plan. These problematic behaviours also can be 
explained with the hyperbolic discounting model, but researchers have proposed 
two discount models to better describe the neural and behavioural correlates of 
impulsivity and inconsistency in intertemporal choice. 


1. Q-exponential discount model 
This function has been proposed and examined for subjective value V(D) of 
delayed reward: 


1 


V(D) = = A/[1+(1—q)k,D]™ 


exD, (kg D) 


where D denotes a delay until receipt of a reward, A the value of a reward at D 
= 0, and kq a parameter of impulsivity at delay D = 0 (q-exponential discount 
rate) and the q-exponential function is defined as: 


exp, (x) =(1+ 1-9) 


The function can distinctly parametrize impulsivity and inconsistency [14, 8]. 
2. Quasi-hyperbolic discount model 

This model is based on the behavioural economists’ theory, according to the 
inconsistency in the intertemporal choice is attributable to an internal conflict 
between “multiple selves” within a decision-maker. They consider that each 
individual has (at least) two exponential discounting selves (with at least two 
exponential discount rates), where the self with a smaller discount rate wins 
when delayed rewards are at the distant future (>1 year), and the self with a larger 
discount rate wins when delayed rewards approach to the near future (within a 
year), resulting in preference reversal over time. The quasi-hyperbolic discount 
model (also as a B-8 model) can parametrize this intertemporal choice behaviour. 
For discrete-time t (the unit assumed is one year) it is defined as: 


F(t) = B5'(fort = 1,2,3...)and F (0) = 10 < B <8 <1). 


A discount factor between the present and one-time period later (6) is smaller 
than that between two future time-periods (8). In the continuous-time, the 
proposed model is equivalent to the linearly weighted two-exponential functions 
(generalized quasi-hyperbolic discounting): 


V(D) = Alwexp(—k D) + (1 — wyexp(—knD)] 


where w, 0 < w < 1, is a weighting parameter and k1 and k2 are two exponential 
discount rates (kl < k2). Note that the larger exponential discount rate of the 
two k2, corresponds to an impulsive self, while the smaller discount rate k1 
corresponds to a patient self [8]. 
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3 The Importance of Mental Models 


Mental models play an important role in these behavioural issues because they 
generate impulsivity, which influences every individual choice. Mental models are 
informal models constructed by problem solvers who help them to relate cause and 
effect, for that they are present in each phase of the problem-solving process, both 
if it is a single agent decision process or a multi-agent one, and they are always 
influenced by personal preferences and experiences sometimes causing positive and 
sometimes negative effects. 


3.1 Thaler and Shefrin Model 


In the setting of Multiple Selves Models, Thaler and Shefrin proposed a “planner- 
doer” model to control impulsivity, which draws upon principal-agent theory [13]. 
According to this theory, each individual has got two distinct psyches: one planner, 
which pursue longer-run results; and multiple doers, that are concerned only with 
short-term satisfaction. Thaler and Shefrin gave this example in [13]: consider an 
individual with a fixed income stream y = [y1, y2,..., yr], where », y = Y 
which has to be allocated over the finite interval (0, T). The planner would choose 
a consumption plan to maximize his utility function. 


t 
V(Z,, Z2,..., Zr) subject to ct <Y 


t= _ 


in which such Z; is a function of the utility of level consumption in t (c;). On the other 
hand, an unrestrained doer 1 would borrow Y — y; on the capital market and therefore 
choose c; = Y; the resulting consequence is naturally cp = cz3 = --- = cp = 0. Such 
action would suggest a complete absence of psychic integration. 

In their model, Thaler and Shefrin consider that the planner can control the 
behaviour of the doers in two ways: (a) by imposing rules in this way it alters the 
restrictions imposed on any given doer; or (b) by using discretion and some method 
to change the incentives or rewards to the doer but not self-imposed restrictions [8]. 


4 The Influence of Information in a Multi-agent Decision 
Process 


In many decision processes, the information is known, and how they are represented 
play an essential role (see e.g. [4, 15—19]). This is particularly true in multi-agent 
decision problems (see e.g. [20—22].), in which all the people involved have their 
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intrinsic mental models and emotions of each agent can influence group behaviour, 
modifying their mental models. 

In a multi-agent decision problem, each individual takes his intertemporal choice 
considering others’ preferences, because they need to achieve a joint decision. Indeed, 
as explained in Squillante and Ventre [22],““group decision problems consist in finding 
the best alternative(s) from a set of feasible alternatives A = {d,, ..., d,}according 
to the preferences provided by a group of agentsE = {e,,..., €m}..” The objective is 
to obtain the maximum degree of agreement among the agents’ overall performance 
judgments on the alternatives. 

We demonstrate that in this process different levels of availability or quality 
of the information about the problem or other participants have an effect on the 
dynamics of the problem-solving process, and can produce a different reaction of the 
participants, so can lead to a different solution of the same case. We analyse three 
different situations linked to different levels of information: (1) lack, (2) excess and 
(3) erroneous. 


4.1 Lack of Information 


We should not see the emotional effects caused by the different levels of information 
when information about other members of the decision group is absent, so implicit. 

However, research in other areas of social judgement has revealed that people 
judge others in the same way that they judge themselves: false consensus effect. 
Consequently, in a multi-agent decision problem, each decision-maker overestimates 
his own opinion. And this effect becomes more pervasive when people lack the 
necessary data to base their judgments [22]. Engelmann and Strobel demonstrated 
that when in a decision group, the agents have explicit information about the choice 
of other members of their group, this information acts in the opposite way of false 
consensus effect. On the contrary, when information is implicit, a false consensus 
effect arises in the group [23]. 

We can consider a multi-agent decision problem as a strategic interaction, so in 
case of absence of information and cooperation among the participants, it can be 
assimilated to a non-cooperative game. According to Thaler and Shefrin’s model in 
this kind of games, we cannot implement some “precommitment” to governor the 
impulsive part (doer). 

As a consequence, we can demonstrate that it is not possible to recognise the best 
choice on a rational base [24]. We analyse the traditional non-cooperative multi- 
agent decision problem “prisoner’s dilemma”, on one temporal interval and with 
only two alternatives, according to which “the police arrest two suspects (A and B) 
and interrogate them in separate rooms. Each can either confess, thereby implicating 
the other, or keep silent’. In terms of years in prison, the payoff for each strategy. 
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Agent A 
Confess (C) Do not confess (NC) 
Agent B Confess (C) byes) 0, 10 
Do not confess (NC) 10, 0 1,1 


In the absence of information, this set of payoffs moves each player to confess, 
implying two rational players with consistent preferences, even if the best solution 
is that both “do not confess”. As a consequence, there is a paradoxical situation in 
which rational players lead to a more unfortunate outcome than irrational ones. If 
we analyse this situation using Thaler and Shefrin’s model we can say that the doer 
of each player drives his agent to make a decision that leads him a more significant 
advantage, and applying the theory of “false consensus effects” each one thinks that 
other will make the same. So, there is much of a chance that the doer of each prisoner 
will choose the strategy of “do not confess”, in contrast to rational theory, and that the 
agents achieve as common decision the best strategy (NC, NC). However, in other 
types of non-cooperative problems, this mechanism should not lead to a common 
strategy, so the players will never achieve a joint decision without a prior agreement. 
This can be seen in another example: a multi-agent decision problem in which the 
agents set to save money to realise a common purchase. Each agent has a fixed 
income, Yq and Yx, and a nonnegative level of saving, S, and Sg. In this case, the 
planner of each agent wants to maximise his utility function of saving (thinking for 
future), and the doer of each agent wants to consume Y now to realise the highest 
advantage at present. Then each impulsive doer, thinking that everybody will make in 
the same way for the effect of false consensus, assigns weight = | to one preference 
and weight = 0 to all the others. In this case, the agents cannot obtain a common 
objective in the long term. So, to realize common purchase thee agents need to set 
some rule or some method to alter the incentives for the doers. The following example 
can represent the problem: 


Agent A 
Save (S) 


Do not save (NS) 
Agent B 


Do not save (NS) 


where the payoffs represent the utility of each agent for each strategy. 

According to the rational choice, the Nash equilibrium matches with the best 
strategy (S, S). However false consensus effect and impulsivity lead each agent to 
the worst equilibrium (NS, NS); indeed utility functions of the agents are influenced 
by their impulsive doers that prefer obtain a more significant advantage in the short- 
term before that other make the same, so both choose “do not save’. For that, they 
do not achieve a consensus on a common decision and the failure of the theory of 
non-cooperative games for the lack of confidence. 
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In conclusion, if there is no information and no cooperation, each agent is influ- 
enced by own doer, as an individual choice and not a strategic one. Furthermore, 
each agent thinks that the other will make the same for the effect of false consensus. 
Then the utility function of planners does not coincide with the utility function of 
their doers, so the prevailing party determines the strategic choice. Therefore, in a 
non-cooperative multi-agent decision-problem, two situations are possible: 


1. the utility function of planner coincides with a utility function of the doer for 
each agent, and they correspond among all agents, then the agents will reach a 
common decision that is given by the unanimous choice, 

2. at least one doer has a different preference from other doers and preferences of 
his planner, so assign weight = 0 to the other preferences, and it is not possible 
to aggregate them. 


Hence, we can assert that in a non-cooperative decision problem, it is only a 
chance of obtaining a joint decision. 


4.2 Excess of Information 


According to Engelmann and Strobel’s experiment, there is no false consensus effect 
if representative information is highly prominent and retrievable without any effort. 
Instead, there is an essential effect in the opposite direction, indicating that subjects 
consider others’ choices more informative than their own [23]. This is the “overcon- 
fidence” or “groupthink”, a psychological phenomenon which can occur in highly 
trained cohesive groups. Hence, in the extreme event in which all is known in the 
decision-making process, the interplay between different subjects involves other 
behavioural effects anyway, as the excess of consensus, apart from the influence of 
behavioural effects and mental models of each individual. 

Cooperative games can model, in the OR field, this kind of decision-making 
process. In these games, the rationality of the equilibrium choice is saved by the 
possibility of making an agreement among agents, which represents a pure rule to 
maintain self-control at a later time in Thaler and Shefrin’s model. Furthermore, if 
there is an arrangement the agents have explicit information about the choices of other 
members, so there is no false consensus effect, in line with the result of Engelmann 
and Strobel’s experiment. The only decision which will prevail is determined by the 
strength of each mental model. An example of a cooperative game is a coordination 
game when players choose the strategies by consensus decision-making [24]. A 
traditional coordination game is the “battle-of-the-sexes”. As indicated in [24], in 
this game an engaged couple must choose what to do in the evening: the man prefers 
to attend a baseball game, and the woman prefers to participate in an opera. In terms 
of utility, the payoff for each strategy is as follows: 
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Man 
Opera (O) Baseball (B) 
Woman Opera (O) 3,1 0, 0 
Baseball (B) 0, 0 1,3 


In this example, there are multiple equilibriums: (B, B) and (O, O). However, 
both players would instead do something together than to go at separate events, 
so no single individual has an incentive to deviate if others are conforming to an 
outcome. In this context, a consensus decision-making process can be considered as 
an instrument to choose the best strategy in a coordination game. A common final 
decision is achievable only if the man and the woman have explicit information and 
if there is cooperation. 

In terms of the Thaler and Shefrin’s model, the agents can elude the influence of 
doers because they can enforce contracts through parties at period-one. The “pre- 
commitment” remove the problem of loss of self-control, because they eliminate all 
choices. However, the only one who will maximise his utility function will be the 
agent with the most reliable mental model, for that the decision can change for each 
couple of players. 


4.3 Erroneous Information 


In a multi-agent decision problem, information detained by the participants can be 
wrong for two causes (not independent of each other): 


1. misunderstandings and 
2. manipulations. 


In the first case, the different reactions of people at the same problem with the same 
information are not connected with people’s lack of cognitive abilities but which the 
way the situation is described in the communication. In the second case is the same, 
but one member of the group has the will to misrepresent the problem for self-interest. 
Self-interest is the primary cause of biases, particularly in participatory processes 
with multiple stakeholders. It drives strategic behaviour, producing all priming and 
framing effects [25]. 

In the end, in a cooperative game, erroneous information can modify the mental 
models of the agents and so can change the strategic solution that they finally choice, 
for example in a cooperative game the arrangement established to obtain a shared 
decision can be influenced by strategic behaviour of each agent to change the choices 
of others. 
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5 The Influence of Communication: BOR (Behavioral 
Operational Research) 


5.1 Behavioural Perspective in Decision Analysis 


Behavioural perspective is necessary for decision analysis, because if we do not 
include behavioural effects in theoretical models, then they appear when we use the 
model in real-life problem solving, as happens in OR models. We should accept the 
fact that we are not likely able to produce a “perfect” model but still could find one 
that is useful. However, “modelling is not about models only, but it matters how we 
choose the models and how we work with the models” [25]. Even if we can prove 
that there is an optimal policy or decision based on the model of the real world, the 
process by which the real world is simplified into a model is a process subject to 
behavioural effects [26]. 

Creating a model to manage and solve problems is a process composed of many 
phases, and human behaviour controls each stage of the process and mediates the 
progression through steps [26]. Hence the behavioural lens needs to be integrated in 
the practice of OR as an additional perspective. Behavioural Operational Research 
(BOR) “considers the human impact on the process of using operational research 
(OR) methods in problem-solving and decision support as well as using OR methods 
to model human behaviour” [25]. 

Also, modellers are subject to cognitive and motivational biases and the way the 
decision problems are framed. Furthermore, in many models can be very hard to 
change later the initial modelling choices, and these can have a crucial impact on the 
path the modelling process will proceed (path dependency) [25]. 


5.2. Role of Communication and System Perspective 


Communication is an essential part of modelling. The modeller should not only 
be concentrated on the perfection of the accuracy of the model, the process and 
communication count a lot too. They can get opposite results depending on the way 
the phenomenon is described and how the questions are phrased, and graphs used. 
This may affect the behaviour and preferences of the participants, as in the case of 
one player give erroneous information to others. 

Hamalainen et al. [26] by doing a series of experiments related to the under- 
standing of and communication about dynamic systems, showed that the communi- 
cation phase of the OR process could be sensitive to priming and framing effects. So, 
the purpose for which the model is developed is reflected in the parameters and scales 
as well as in the level of detail used: “there is not a single valid model fitting every 
purpose” [25]. Modellers should identify the possibility of strategic behaviour of the 
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participants, for example, the misrepresentation of preferences or data in an envi- 
ronmental participation process. However, also modellers are guided by self-interest 
and so can use their role to influence or manipulate people’s behaviour. 

Furthermore, when a modeller starts his work, she is already a part of the social 
system herself. It is created by all the people involved in the problem solving with 
their intrinsic mental models, intentions, expectations and cultural habits. The chal- 
lenge of the modeller is to act within this overall complex, including both the envi- 
ronmental system under study and the organisation of the social problem-solving 
system. They need to think that are or should be part of the models used. They need 
to recognise strategic and motivational goals in communication which influence and 
create systems. They need to be sensitive to the emotion role in the people deci- 
sions. “The modeller needs to widen his skill set from systems thinking to systems 
intelligence. Systems intelligence refers to our ability to act intelligently and produc- 
tively in systemic settings and to see the interconnections and leverage points in the 
system.” [25]. 


6 Cooperative and Non-cooperative Multi-agent Decision 
Problem in BOR Perspective 


We can show more easily the role of communication of the modellers in the non- 
cooperative multi-agent decision problem. As we have seen in Sect. 4, in this kind 
of decision problems there is no information about other members of the decision 
group and no cooperation among them, so the final strategic decision is not influenced 
by information. Nevertheless, the final choice of the same problem (with the same 
model) can be different when the participants or the time of the choice change, 
because of the ability of self-control and the utility function of every single agent. 
Then the different level of self-control and the different utility function among agents, 
or for the same agent but in a different period, depending on each mental model, which 
are influenced by preferences and personal experiences. 

However, we have seen that cognitive biases in decision making are also related 
to the way the decision problems are framed, and modellers are also subject to a 
number of other cognitive biases; also motivational biases can influence the quality 
of modelling by distorting the parameter elicitation and expert judgments; then, the 
cumulative effects can produce other biases [25]. 

For example, in the case of the prisoner’s dilemma, we can obtain different results 
if the modellers present the problem differently. The modellers could establish more 
years of prison in case both confess, instead of in case both do not confess: 


Agent A 
Confess (C) Do not confess (NC) 


Agent B Confess (C) 1,1 0, 10 


(continued) 
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(continued) 
Agent A 
Confess (C) Do not confess (NC) 
Do not confess (NC) 10, 0 5,5 


If it is the case, the doers of two players drive their agents to confess, believing 
that the other will do the same because of false consensus. 

Hence, the way the phenomenon is described and how the questions are phrased, 
and graphs used influence the behaviour and preferences of the participants, so their 
mental models, so their utility function and their ability of self-control and so, finally, 
their choice. 

This happens even in the cooperative multi-agent decision problem, in which 
these effects are added to all other effects derived from information internal at the 
decision group (excessive or erroneous). 

In conclusion, we can affirm that the failure of rational models is first linked with 
the presentation of the problem to the agents. The first aspect that influences the final 
choice in a decision process is the model of the problem, which relates to the mental 
models of the modellers. But even the mental models of the modellers are influenced 
by all the effects of information that they held and of the goal that they want to 
obtain. According to the level of information about the problem held by modellers 
they can be influenced by a false consensus effect or overconfidence to choose the 
model strategically. This conclusion suggests new researches in this line. 
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Parameter Estimation of Fuzzy Linear ®) 
Regression Utilizing Fuzzy Arithmetic ae 
and Fuzzy Inverse Matrix 


Murat Alper Basaran and Biagio Simonetti 


Abstract The parameter estimation methods of fuzzy linear regression available in 
the literature have adapted several techniques ranging from mathematical program- 
ming to fuzzy least squares which is based on distance notion. Therefore, the literature 
is vast. However, researchers still try to look for a method that is expected to have 
estimated spreads as small as possible and good predictions. Each parameter esti- 
mation method proposed in the literature has treated fuzzy numbers based on how 
it implements the data. Namely, the proposed methods somehow restrict the usage 
of the fuzzy data. In this chapter, a novel method consisting of new notions and 
calculation procedures is proposed in order to find the parameter estimates of fuzzy 
linear regression. 


Keywords Fuzzy arithmetic - Fuzzy zero number - Fuzzy one number + Fuzzy 
identity matrix - Fuzzy inverse matrix - Parameter estimation in fuzzy regression 


1 Introduction 


Since the first paper published by Tanaka [20] the subject has attracted many 
researchers. Studying different aspects of fuzzy linear regression ranging from 
parameter estimation to application point of view has contributed to the literature. 
Most studied aspect of fuzzy linear regression is estimating the parameters. Different 
models have been investigated based on the types of parameters or variables using 
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various methods. The most generic type of fuzzy linear regression is the one given 
in (1.1) where dependent variable, independent variable and parameters are fuzzy 
numbers. 


¥ = Ap + Ain + AX t+... + AjXaj +... + AmXim (1.1) 


where A ; is the jth coefficient of the fuzzy linear regression, oe ; is the fuzzy indepen- 


dent variable denoting the jth variable of the kth observation, Y; is the kth estimated 
dependent variable. As said before, the related literature is vast. Different types of 
fuzzy linear regression models have been examined based on the different types of 
data. For example, the simplest one is that crisp data for both independent and depen- 
dent variables are used when the relationship between those variables are assumed 
to be fuzzy. 

Other types of data structures are investigated when fuzziness is assumed between 
variables. Some interesting results can be found in [1, 3, 4, 6-8, 11, 13, 16, 19,21, 22]. 
Obtaining useful parameter estimations are still being searched since several methods 
end up with much fuzzier estimates which make no sense in terms of both application 
and methodology. Therefore, different types of parameter estimation methods have 
been developed. Roughly speaking, two major parameter estimation methods, which 
are mathematical programming based methods and fuzzy least squares, have been 
investigated and modified by the researchers aiming at reaching more reliable and 
less fuzzier parameter estimations. Each parameter estimation method proposed in 
the literature has treated fuzzy numbers based on how it implements the data. Namely, 
the proposed methods restrict somehow the usage of the fuzzy data based on what 
the corresponding method enables to carry out. For example, fuzzy numbers are 
split into parts such as center values and spread values that are suitable for linear 
programming. On the other hand, fuzzy least squares method proposed by [6, 7, 
11] allows fuzzy numbers to be written in the form of a metric which involves the 
square of the differences between the center values and spread values of the two 
fuzzy numbers, respectively. Then those two quantities are added for a minimization 
problem with respect to some constraints. Therefore, selecting either a well-known 
method or defining a new one for determining fuzzy parameters restricts somehow 
the treating of fuzzy numbers. In this chapter, we focus on the model given in (1.1) 
whose estimation of the parameters is searched. Computing fuzzy inverse matrix 
is plausible with some newly introduced notions and definitions such as fuzzy zero 
number, fuzzy one number, fuzzy identity matrix which leads to fuzzy inverse matrix 
at the end which can be found in detail in [2]. Utilizing those new notions and 
definitions enable us to compute parameter estimations of the fuzzy linear regression 
by directly employing fuzzy arithmetic throughout the all computational steps. The 
chapter is organized as follows: Sect. 2 briefly gives information about the notions and 
definitions of fuzzy set theory that has been already known by researchers. Section 3 
give detailed accounts of both fuzzy inverse matrix calculation and FLES which will 
be the base to construct the computational procedure in the next section. Section 4 
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explains how to calculate parameter estimations by benefiting from the information 
presented in the previous section. Section 5 concludes the discussion. 


2 Preliminary 


In this section, we briefly mention some fundamental concepts and definitions and 
give useful references where interested reader finds detailed discussions related to 
the subject. We did not skip some basics here since its help would be base in order 
to construct notions that are helpful in parameter estimations. 


Definition 1 ([4]) Fuzzy number M is a fuzzy set defined on the set of real numbers 
R characterized by means of a membership function wy(x), uu : R > [0,1], 
where 


0 for x <a, 
fax) for axe, 

Um= 41 for c<x 
ga(x) for d<x <b, 
0 for x>b. 


Definition 2 (({12]) Fuzzy number M is called a number of LR type if its membership 
function jz is the following form: 


where L and R are continuous non-increasing functions, defined on [0, oo], strictly 
decreasing to zero in those subintervals of the interval [0, oo] in which they are 
positive, and fulfilling the conditions L(0) = R(O) = 1. The spread parameters a, £ 
are non-negative real numbers. 


Generally, L and R are called shape functions. Also, LR type fuzzy numbers are 
written in the parametric form as follows [12]: 

M = (m,a, B) where a, f are called left and right spreads, respectively. When 
the spreads, a = £, the fuzzy number M is called symmetric fuzzy number [10]. 

Based on the parametric definitions of LR type fuzzy numbers, arithmetic opera- 
tions are defined by [12]. Addition and multiplication are given here for illustrative 
purposes. More detailed explanations and all formulas can be found in [12]. 

For two positive LR type fuzzy numbers M =(m,a, B) and N = (n, y, 8) 

Addition is given as follows: 
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M ®N = (m,a, Bir ® (n, y, dix = (m+n,aty,B+)rr 
Multiplication is given as follows: 
M @N =(m,q, B)rr ® (n, y, 8)ir © (mn, my +na, md + nB)ie 


It should be noted that the resulting fuzzy number is an approximated result. 
Scalar multiplication is given as follows: 


Lei = Pp eiees AB))rr, A>O, 
(Am, —AB,Aad)pR, A<O. 

It should be noted that all fuzzy numbers which will be used for all computations 
are chosen to be symmetric triangular fuzzy numbers throughout the chapter for the 
sake of simplicity. Fuzzy matrix was mentioned roughly by [12] as a rectangular 
array of fuzzy numbers. Then, the formal definition given below is defined by [9]: 


Definition 3 ({9, 10]) A matrix A = (4; j) is called a fuzzy matrix, if each element 
of Aisa fuzzy number. Also, let A= (a;;) and B= (bij) be two mxn and nxp 
fuzzy matrices. The size of the product of two fuzzy matrices is mxp and is written 
as follows: 


® = 
where ¢;; = )°  djx ® by; where ® is the approximated multiplication. 


In a similar fashion, a fuzzy matrix A consists of center, left spread and, right 
spread parts just as a fuzzy number. This can be written in a parametric form as 
follows. 

A= (A, L, R), where A, L, R are crisp matrices and denotes center matrix, left 
spread matrix and right spread matrix, respectively. Also, they have the same sizes 
[12]. 

The pozitiveness, negativeness, and being zero of a fuzzy number can be defined 
as follows. 


Definition 4 [10] When the support of a fuzzy number is given as [a1, a2], a fuzzy 
number is said to be positive if 0 < a, < ap. Similarly, a fuzzy number is said to be 
negative if a, < daz < 0. Finally, it is called as fuzzy zero number if a; < 0 < a. 


Based on the Definition 4, it is possible that fuzzy one and zero numbers can be 
defined. Actually, a modified version of the definition of fuzzy zero number can be 
obtained by putting restrictions on a; and a2. Those definitions will be given in the 
next section. 
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3 Calculating Fuzzy Inverse Matrix 


This section consists of two subsections. The first of which covers the basic introduc- 
tory information of fuzzy linear equation systems (FLES). The second part includes 
a new way of defining FLES which is the modified version of it that will be an 
important tool in order to compute fuzzy inverse matrix. FLES is an active research 
area which examines mainly the vector solution of FLES consisting of various types 
ranging from fully fuzzy linear systems to mixed types which means that while the 
former one is the system whose components are all fuzzy numbers, the latter one is the 
system whose components are both fuzzy and crisp numbers. The solution procedures 
in general can be based on different methods such as extension principle, interval 
arithmetic and fuzzy arithmetic. We briefly mention the key researches related to 
what have been done in terms of finding vector solutions of those systems. Interested 
reader can reach more comprehensive list if looked at the reference sections of those 
manuscripts in [3, 5, 9, 10, 14, 15, 17, 18]. In the second part, modified version of 
FLES with the help of some new definitions and notions is first introduced in order 
to compute fuzzy inverse matrix. This computation will help determine the fuzzy 
coefficients of fuzzy linear regression in the next section. 


3.1 Fuzzy Linear Equation Systems 


The generic type of FLES is called fully fuzzy linear equation systems studied by [9]. 
The other types are simplified version of the generic type and their solution proce- 
dures are based on various methods. This subject is out of scope for this manuscript. 
The general form of FLES is given as follows: 


A@i=b (3.1) 
where A is a fuzzy coefficient matrix, x is unknown fuzzy vector and bisa fuzzy 


right-hand side vector. When (3.1) is written in a system of linear equations, (3.2) is 
obtained as follows: 


(G11 ® X1) © (G12 @ ¥2)... B Gin @ Fn) = by 
(G21 ® X1) ® (G2 @ ¥2)... B Gon @ Fn) = Dy 


(Gini @ X1) ® (Gn2 ® X2)... B Gan @ Fn) = Bn. (3.2) 


As emphasized before, the solution procedures of this system have been studied 
using different methods such as extension principle, interval arithmetic and fuzzy 
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arithmetic so far. This system is solved using fuzzy arithmetic in order to find the 
fuzzy vector called x by [9]. 


3.2 Calculating Fuzzy Inverse Matrix 


In order to find the inverse of a given fuzzy matrix, the system given above is modified 
into the one below: 


AX =T, (3.3) 


where A is the fuzzy coefficient matrix just as in (3.1), on the other hand, the fuzzy 
vector x in (3.1) is replaced by fuzzy matrix X which denotes the unknown matrix 
that represents the fuzzy inverse of A, the right hand side fuzzy matrix T is called 
the fuzzy identity matrix. Before moving to computational procedure, we need to 
define some notions such as fuzzy zero number, fuzzy one number, and fuzzy identity 
matrix. 


Definition 5 [2] If the center value of a fuzzy number is | and the left and right 
spread values are 6 and A, where 0 < 6,4 < 1, itis called fuzzy one number and is 
denoted by | = (1, 6, A) 


It is easily seen that the spread values 6, A take values between 0 and 1. Therefore, 
the left and right end points change between 0 < 1 —6 and 1+A < 2 for fuzzy one 
number. Also, the similar definition can be given for fuzzy zero number [2]. 


Definition 6 [2] If the center value of a fuzzy number is 0 and the left and right 
spread values are w and 6 where 0 < a, B < 1, it is called fuzzy zero number and is 
denoted by 0 = (0, a, 8). 


Therefore, the left and right end points change between —1 < O—aand0+8 < 1 
for fuzzy zero number. With these new definitions, the so-called fuzzy identity matrix 
can be defined. When A = 6 and a = , both fuzzy 1 and 0 are called symmetric 
numbers. For the sake of simplicity, all calculation is based on symmetric ones. 


Definition 7 If the diagonal elements of a fuzzy matrix are fuzzy one numbers and 
the off-diagonal elements are fuzzy zero numbers, then it is called fuzzy identity 
matrix and is denoted by /. 


al 

ll 
—2 
or 


al 


yeaa 


The equation given in (3.3) can be rewritten in equation form as follows: 


Parameter Estimation of Fuzzy Linear Regression ... 427 


(G11 ® X11) ® (G12 @ 21) ®... B Gin @ Fn) = 1 
(Go1 ® X11) © (Gx2 ® ¥21) @... B Gn @ Fn) = 0 


(Gint @ Xin) ® Gni2 ® Xn) B --- B Gan ® Xan) = 1. (3.4) 


Solving the equation system in (3.4) gives the inversion of the fuzzy matrix A. 
More detailed treatment of the procedure can be found in [2]. 


4 Parameter Estimation in Fuzzy Linear Regression 


The fuzzy linear regression model given in (1.1) consists of fuzzy independent, 
fuzzy dependent and fuzzy parameters. This type of model has been examined 
by researchers utilizing the methods that are previously mentioned throughout this 
manuscript in [1, 6-8, 11, 13, 16, 20, 22]. In this chapter the fuzziness of the data is 
directly exploited using fuzzy arithmetic in order to calculate parameter estimations 
of the model given in (1.1) based on the new notions and the definitions presented 
in the previous section. We begin with the model given in (1.1) which is as follows: 


A 


¥, = Ag t+ AiXur + AoKin +... + AjXiyjy +... + AmXim- 


Although the constant Ao term can be easily computed by inserting a fuzzy vector 
consisting of fuzzy one numbers into the proposed method, it is excluded from the 
model for the sake of simplicity. Therefore, (1.1) is rewritten as follows: 


1 = Ai Xi + AoXin +++» tAjXay +--+ + AmXim, (4.1) 


Metis 


where Aj — (a;, Qj, Bi) ees (Gj = 1,...,m), is the fuzzy coefficients denoting center 


value, left spread value, and right spread value respectively. In a similar way, Xx, j= 
(xij, Kj, thins (k = 1,...,m), fuzzy independent variable and Y= (5. bx, éx) ; 
LR 


(k = 1,...,n), fuzzy dependent variable are denoted. The model given in (4.1) can be 
rewritten in the matrix form using fuzzy approximate arithmetic proposed by [10]. 


Y=X@A, (4.2) 


where Y is nx1 fuzzy matrix, X is nxm fuzzy data matrix and A is mx1 coefficient 
matrix. The right hand side of (4.2) is the approximate multiplication of fuzzy data 
matrix by fuzzy coefficient vector conducted by the method proposed by [11]. The 
Eq. (4.2) can be written as follows: 
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Ay ® Xi, + Az @ Xi2 +--+: + Am ® Xim 
Y=XAwx = ; (4.3) 


Ay @ Xn + Ap ® Xno + +++ 4+ Am @ Xam 


Equation (4.3) denotes the model in the matrix notation containing the fuzzy 
multiplication. We conducted this multiplication assuming that the coefficient matrix 
Aisa positive fuzzy matrix. Assume that other types of fuzzy coefficient matrix such 
as negative one is encountered, and then the method proposed by [7] can be applied 
to the other cases. In order to calculate the parameter estimations, the next phase is to 
multiply both sides of (4.2) by X", which is the transpose of fuzzy matrix X. Then, 
the new expression is given as follows: 


X’ @Y =X" @X@A. (4.4) 
The right hand side of (4.4) can be rewritten as follows: 
RX’ e@V= (x7 @ x) @ A. (4.5) 


Then, the expression in the parenthesis on the right hand side is also positive 
fuzzy matrix since both matrices are positive. Based on the new definitions and 
computational procedure given in the previous section, we search for the inverse of 
X? @X.Then, the resulting inverse matrix multiplied by both sides of Eq. (4.5) result 
in as follows: 


(z7 ® x) @X @¥ = (7 @ a) a(iTex)ed 46) 


i NL Pia BS ee 
where (x T@X ) (x T@X ) ~ I is assumed since the center values of the matrix 
consist of 1’s on the diagonal and 0’s on the off-diagonal. 


as wal = is 
Then, (x T@X ) ® X? @Y is an equation that needs to be solved by equating 


the center of the left side to the center of the right side, left spread of the left side 
to the left spread of the right side, and similarly it is done for the right spreads of 
the both sides. We illustrate this approach with an example in order to denote each 
step of computational procedure. The data have five observations consisting of two 
independent and one dependent variable. The Table | depicts the data. 

We begin with calculating coefficients of the linear regression in order to under- 
stand the signs of the coefficients. It is noted that the coefficients of linear regression 
based on least squares method using the center values of fuzzy observations are 
positive numbers, which are X; = 0.902 and X2 = 0.727. Therefore the equations 
given in (4.2) are applicable for positive multiplication rule proposed by [7]. Then 
(4.2) is multiplied by X7 on left side which ends up with X? @ X. The similar 
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Table 1 Data x ra y, 
(8,1,1.5) (8, 0.5, 0.75) (10, 1, 1) 
(6,0.5,0.5) (9, 0.4, 0.6) (8, 1, 2) 
(7,0.25,0.5) (6, 0.5, 0.5) (7, 0.5, 1.5) 
(9,0,1) (7, 0.5, 0.5) (9, 1, 2) 
(8,1,2) (6, 0, 1) (8, 0.5, 0.5) 


argument is valid for the left side of (4.5). The next step is to calculate the inverse 
of X? @ X. Then, applying it on both sides of (4.5) just result in J @ A on the 


a aol a S 
right side and (x T@X ) @ X’ @Y on the left side respectively. There are many 


computational steps to reach the parameter estimates. Therefore, we skipped some 
mechanical phase and just give some important results. We begin with the matrix 
X? @ X given as follows: 


(294, 41.5, 83.5) (271, 34.4, 62.6) 
(271, 34.4, 62.6) (266, 28.2, 35.8) |” 


Then X7 ® Y is calculated as follows: 


(322, 54.25, 100) 
305, 46.4, 80.3) | 


- . 7 ~\-l 
The inverse of (x T@X ) denoted by (x POX ) is calculated based on the 


computational procedure given in detail in [10] and generally in the previous section. 
Its result is given as follows: 


(0.055, |—0.009|, |—0.0001|) (—0.056, 0.004, 0) 
(—0.056, 0.011, |—0.002|) (0.061, 0.011, |—0.002| |” 


Due to the definition of LR type fuzzy numbers, the spreads of LR type fuzzy 
numbers cannot be negative. Therefore, absolute values of spreads are taken. 


a wil ‘2 is 
(xr @ x) @ X" @Y is obtained as follows: 


(0.63, 7.26, 10.03) 
(0.58, 6.69, 10.47) | 


Therefore, the computation of left side is reached. 
ws Pe ee ae as as ” . 
(x T@X ) (x T@X ) @ A is the right side of the computation. Then, / @ A is 


reached which is given as follows: 
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~~ [ (0.99, 5.77, 8.13) (0.01, 4.85, 5.47) tay, 8) 
T@A= here A = 

ae hee 5.99, 8.43) (1.04, 5.54, wel ee is on, | 


Then, the equation is solved by equating each component on the left side to 
each component on the right side respectively. The solution gives the parameter 
estimations. 

jedA- (0.63, 7.26, 10.03) 
(0.58, 6.69, 10.47) 


The model is then written as follows: 


} = (0.63, 1.12, 2.07) ® X; + (0.52, |—0.04], 1.99) @ X. 


5 Conclusion 


In this chapter, some new notions and definitions such as fuzzy zero number, fuzzy 
one number, fuzzy identity matrix and fuzzy inverse matrix that were defined in our 
previous paper [2] are briefly introduced in order to employ them in the compu- 
tation of fuzzy inverse matrix that is an essential tool in the computation of fuzzy 
linear regression for this chapter. The model investigated consists of fuzzy dependent 
variable, fuzzy independent variable and fuzzy parameters. Even though researchers 
have been examining the problem for almost three decades, fuzzy arithmetic cannot 
be employed through all computational steps regarding parameter estimates. New 
notions and definitions with novel computational procedure lead to a new param- 
eter estimation method which is the extension of classic least squares calculation in 
the matrix form. Due to the approximated multiplication procedure, the results are 
obtained approximately. 

Making comparison the proposed method with different methods investigating the 
same problem may be suggested. Also, error calculations might be said to be calcu- 
lated. However, there exists a fundamental difference between the method proposed 
here and the others. The available methods have used wide range of different methods 
changing from mathematical programming and fuzzy least squares to interval arith- 
metic. The common point of them is using fuzzy observations as fuzzy numbers in 
some steps of the whole computational procedure not in all steps. Therefore, this 
point makes comparisons unsound. 

As a result, the classic least squares formula is extended to the fuzzy case with 
some new notions and computational procedure. In order to do this, some new 
notions, definitions and computational procedures consisting of employing FLES 
and fuzzy inverse matrix are proposed and developed. Then they are applied to 
estimate the parameters of fuzzy linear regression. 
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The Healthcare System 
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Abstract The systemic risk represents the risk of a deep transformation of the 
structure of an economic and/or financial system that occurs, through a chain reaction, 
after the manifestation of a triggering event [12, 24] (Systemic risk: a survey. CEPR 
Discussion Papers, 2634, 2000; Thoughts on financial derivatives, systematic risk, 
and central banking: a review of some recent developments (No. WP-99-20), 1999) 
related to the dynamics of company crises, particularly the pathological company 
crises. By translating the concept of systemic risk to company dynamics, it is possible 
to approach it to the dynamics affecting the company crises. In the economic-business 
sciences, it is possible to identify the risk of the business crises through predictive 
models, such as the Z-Score [1-5] (Journal of Finance 23(4):589-611, 1968; Bell 
Journal of Economics and Management Science 4(1):184—211, 1973; Journal of 
Finance 44(4):909-922, 1989; Corporate financial distress and bankruptcy, 1993; 
Predicting financial distress of companies: revisiting the z-score and zeta models, 
2000), defined using a multivariate approach based on the consideration and the 
analysis of several factors deemed significant in determining the healthcare of a 
company. The objective of the paper is to create a function that has a predictive 
character in the definition of the company crises. This function will be defined starting 
from the identification, the mapping and the coverage of the risk. 
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1 The Framework 


Every day we make more or less important decisions, either automatically or on the 
basis of articulated reasoning or specific knowledge. The consequences of a wrong 
decision can in some cases only be annoying, in others even catastrophic. Although it 
is often difficult to avoid them, one can try to prevent the most negative outcomes of a 
choice, if one knows some basic mechanisms of decision-making. As the complexity 
increases, the ability to make a decision about a certain problem, problem solving, 
is not so critical as the ability to decide which is the most relevant problem to deal 
with, or what is best to deal with, problem finding. In particular, when we refer to an 
entrepreneurial context, both private and public, it is necessary to be able to recognize 
and perceive the symptoms of a corporate crises. Thanks to the prior recognition, in 
fact, the entrepreneur will be able to face the crises in an adequate way avoiding, 
if possible, the failure of his company [14]. There are various diagnostic tools that 
can be used and among the best known is the analysis of financial statements by 
indexes. The signalling capacity of the economic-financial indexes has been verified 
in numerous researches, starting from the studies of [15], and of [1], up to the most 
recent and sophisticated in-depth analysis concerning the forecast of insolvencies. In 
particular, the literature concerning the risk forecasting models for large companies 
is very vast and essentially revolves around two main methodologies: the Altman 
Z-Score, which consists in predicting insolvency using historical balance data [1] 
and the methodologies that rely on information based on market data [31]. The iden- 
tification of operational and financial difficulties of a company is a topic that is much 
discussed also in the past, the first studies are those of [32]. The author concluded 
that companies at risk of failure showed significant differences in financial values 
compared to healthy ones. One of the most important works on the subject was that 
of [15], which dealt with bankruptcy forecasts and financial index analysis. In partic- 
ular, he understood that a number of indicators could be able to separate companies 
at risk of bankruptcy from healthy companies, with up to 5 years notice. Altman’s Z- 
Score [1] follows this procedure exactly, using a multivariate statistical analysis. The 
model is used to assess the probability of bankruptcy of a company and is particularly 
important if it is dynamically used in the analysis of business fundamentals over the 
last 5 years. The values of the Z-Score, different depending on whether they are large 
production companies, production SMEs or SMEs in other sectors, indicate a situa- 
tion of good equilibrium, a precarious balance and the probability of failure. These 
evidences have provided several points for reflection. For example, a 1972 study by 
Deakin uses the same 14 variables previously analyzed by Altman, applying them 
within a series of multivariate discriminating models. Kaplan, in particular, in 1978 
shows that the financial indicators have a strong predictive power: in particular, the 
indicators that measure profitability, liquidity, and solvency are the most significant. 
However, it is not enough to take one just to check the status of the company, because 
there are many factors that lead to the state of insolvency. An analysis based on indi- 
cators can be very useful, but it cannot be totally generalized. These models are 
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therefore used to analyze the risk of insolvency in private companies. From the anal- 
ysis of the literature it is clear that the risk of insolvency has never been studied in 
public enterprises. In this sense, this paper aimed to fill this gap by using the Altman 
model [1, 6-11] as a starting point, adapting it to the needs of a public budget. Our 
focus was to define the insolvency risk in the Italian regions with particular reference 
to the health sector through the analysis of the dedicated budgets. Another interesting 
approach can be also found in e.g. [20, 27]. 

The current regulatory framework of Italian Healthcare is going through a phase 
of profound changes, fuelled by forces that sometimes move in the opposite direc- 
tion: on the one hand the transfer of powers from the State to the Regions, which 
become so responsible for the application of the essential levels of assistance of 
which the State is the guarantor, on the other, the European Community imposes the 
need to harmonize national public accounts and monitor the spending methods of the 
resources allocated to the Regions. The institutional framework that has thus emerged 
is therefore characterized by a strong complexity: a plurality of actors such as the 
central Government, the Regions, healthcare and hospital companies are called upon 
to optimize their behavior and experiment with new forms of organization, regulation 
and production of services, as well as redefining their short, medium and long-term 
strategies. Among the recent regulatory changes on the subject, Legislative Decree 
no. 118/2011 represents an initiative with a strong impact on the harmonization of the 
accounting systems of the Regions, Local Authorities and their bodies, in particular 
Healthcare Institutions. In this context, the economic-patrimonial accounting system, 
becoming indispensable for the management of the companies of the National Health- 
care Service, assumes an increasingly important role, given the need to appreciate 
the economic dimension of the results in the public healthcare governance process. 
In this context, the reforms of the early Nineties are inserted, which have had the aim 
of containing and rationalizing spending, moving towards greater responsibility for 
the Regions, also in consideration of the Agreements for the implementation of the 
Return Plans from the healthcare deficit signed by some regional administrations. 
It is clear that the main objective of the Return Plans is to improve the quality of 
healthcare services and therefore to improve the assistance given to citizens in order 
to guarantee the sustainability of the system itself. It is precisely the services that 
itself provides to represent the core of the National Healthcare System, a loss of 
the same, in this sense, represents a systemic risk for the entire organization [12]. 
The healthcare activity is an activity by its nature risky; systemic approach to risk 
management is therefore considered useful, oriented towards the process and orga- 
nization rather than the individual actor. Our research aims to identify, through the 
analysis of healthcare care budgets (known centralized healthcare management) of 
the regions in the recovery plan and not, with the use of a Z-Score discriminant 
analysis model, the causes that have led the Italian regions in a situation of strong 
deficit. The objective lies in determining and trying to predict the risk signals present 
in the healthcare sector and to implement their management [34]. 

The present work is structured in four parts: the first analyzes the context and the 
scope of application highlighting the main novelties concerning the national health- 
care sector and the risks associated with it; the second describes the methodology 
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used for the analysis of the case study; the third section illustrates the application 
reporting the main results obtained; the last section closes the work and highlights 
future prospects. 


2 Aims of the Application: State of the Art 
and Individuation of Risk Factors 


Italy, as well as the European Union as a whole, is going through a period of prolonged 
economic slowdown, accompanied by a phase of important budgetary restrictions, 
which in turn translate into policies aimed at limiting or reducing spending on services 
healthcare. 

Healthcare is an important sector of the public budget and as such it is inevitably 
subject to repeated containment interventions, especially in times of crises, with the 
aim of making cash, or of reaching a reduction in current expenditure in the short term 
and to postpone investment spending (in infrastructure and technology) [24, 35, 36]. 
The restrictions imposed on healthcare are also affected by the misunderstandings 
suffered by the sector. Public healthcare is in fact often perceived, especially in 
recent years, as a sector of expenditure, with high costs, low levels of productivity 
and widespread inefficiencies, as well as with little return to the economy (also due 
to the difficult measurability of its economic impact and long lead times for positive 
effects). On the contrary, the healthcare sector is able to make a great contribution 
not only to the well-being of people but also to the economy and growth, given 
its importance as a source of employment and income, its widespread distribution 
throughout the territory, its fundamental role in maintaining a healthcare workforce, 
its ability to improve the level of healthcare of future generations as well as being an 
important area of scientific research and technological innovation. 

In recent years, the National Healthcare System has been the protagonist of a 
profound transformation that has led to a necessary rethinking of the Italian insti- 
tutional model and has brought together the Organizations that are part of it with 
completely different tools and logics. One of the structural problems that character- 
ized the National Healthcare System was the constant presence of regional deficits 
and in particular the concentration of these deficits in some Regions: healthcare 
management represents the main item of expenditure in the budgets of the individual 
Regions which must guarantee the economic and financial balance. This constraint 
has not always been respected and it is in order to guarantee the achievement of the 
public finance objectives that the legislator has intervened, introducing the obligation 
for the regions in serious financial deficit to define, together with the Ministries of 
Healthcare and of the Economy and finance, an operational program for reorgani- 
zation, redevelopment or strengthening of the regional healthcare service (so called 
re-entry plan). Each Return Plan must identify the interventions necessary for the 
pursuit of economic and financial balance, without prejudice to compliance with 
the essential levels of performance. The first Agreements for the Return Plans were 
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signed in 2007 for the three-year period ending in 2009, followed by new Agreements 
for the 2010-12 period and Operational Programs for the following years. There are 
seven regions currently subject to the regulation of plans to return to healthcare and 
are all in the center-south, these are: Puglia, Abruzzo, Sicily, Calabria, Campania, 
Lazio and Molise. The repayment plans—i.e. the reorganization, redevelopment and 
strengthening of the regional healthcare service operational programs—are born with 
the 2005 Budget Law and are attached to agreements stipulated by the Healthcare and 
Economy Ministers with the individual Regions. The Return Plans have the purpose 
of restoring the economic-financial balance of the region in healthcare deficit and 
are an integral part of the agreement stipulated between the Ministry of Healthcare, 
the Ministry of Economy and Finance and the Region, aimed at reaching, within 
three years from the signing, a balanced budget, through a stringent rationalization 
of spending programs, a structural intervention on the overall offer of healthcare 
services and the introduction of sanctions for non-compliant regions. The content of 
the Plan must include interventions aimed at rebalancing the disbursement profile 
of the essential levels of assistance, to make it conform to that inferable from the 
National Healthcare Plan and from the decree of the President of the Council of 
Ministers to fix the same essential levels of assistance, and the measures necessary 
for zeroing the deficit. Their subscription represents an obligation in the event of 
deficits exceeding a specific threshold of total financing, initially set at 7% and then 
lowered to 5% starting from 2010. They last for three years, but if at the end of the 
three-year period they are not once the objectives set in the Plan have been achieved, 
they can be extended for a further three-year period by submitting a new Return 
Plan, to be agreed with the Ministry of Healthcare, together with the Ministry of the 
Economy and Finance. If the deviation from the set objectives remains significant, 
the same rule provides for the regional commissioner; in this case, some restric- 
tions on personnel and tax rates expressly provided for the Regions in the Return 
Plan are applied more stringently and automatically. The Regions in the Return Plan 
are subjected to periodic monitoring, aimed at analyzing the actual implementation 
of the commitments undertaken and the effects on the containment of costs and 
on the level of the essential levels of assistance; in the event of failure to achieve 
the objectives, the suspension of the disbursement of the portion of the indistinct 
loan, represented by the bonus portion, is envisaged. It is clear that the main objec- 
tive of the PDRs is to improve the quality of healthcare services and therefore to 
improve the assistance given to citizens in order to guarantee the sustainability of 
the system itself. It is precisely the services that it itself provides to represent the 
core of the National Healthcare System, a loss of the same, in this sense, represents a 
systemic risk for the entire organization. The healthcare activity is an activity by its 
nature risky; a systemic approach to risk management is therefore considered useful, 
oriented towards the process and organization rather than the individual actor. The 
management of the risk profile of the healthcare company presupposes, in this sense, 
the definition of an organized and conscious, systemic and continuous intervention, 
which combines activities and decisions of a strategic nature and an operational 
management phase. The characterizing and priority part of the risk profile of the 
healthcare companies can therefore be identified in the loss of services that due to 
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an adverse event (damage or inconvenience) leads to the worsening or loss of the 
service itself. At the base of every adverse event it is always possible to identify one 
or more errors, committed by individuals or by the organizational system (processes, 
procedures, etc.). Much of the errors (and adverse events) at least in theory could be 
manageable through the activation of a cycle of activities of identification, analysis 
and management, of risks. Our research aims to try to identify, through the anal- 
ysis of healthcare care budgets (known centralized healthcare management) of the 
regions in the recovery plan and not, with the use of a Z-Score discriminant analysis 
model [1], the causes that have led the Italian regions in a situation of strong deficit. 
The objective lies in determining and trying to predict the risk signals present in the 
healthcare sector and to implement their management. Risk management, in turn, 
is based on the knowledge of the elements that constitute it [22, 23, 26, 29, 37]. 
These elements can be described as the set of threats in which the risks materialize 
(ie the source of the risk), the corporate resources affected by the threat (the various 
corporate subsystems exposed to the risk), the vulnerabilities that make the threat- 
ened resources more attackable (weaknesses that can increase the likelihood that 
the threat will materialize) and, finally, the “consequences” of the occurrence of the 
threat (the set of effects on all components of the business system). 


3 Methodology 


The methodology used is the discriminating analysis as [18, 21] it is aimed at iden- 
tifying the best combination of indicators capable of separating two sets as best as 
possible. It is essentially based on the hypothesis that the distribution of the indicators 
is of the normal multivariate type, that the means of the two groups are significantly 
different and that there is equality of the variance and covariance matrix. In particular, 
in order to apply this type of analysis it is necessary: 


Build a sample composed of healthcare companies and insolvent companies 
Identify variables that have an informative content that allows us to discriminate 
against healthcare companies rather than insolvent ones 

Identify discriminating coefficients 

Define the sample and the variables to be inserted in a discriminating function 
that helps to determine the score of each company considered. 


In particular, for our analysis we have identified the Z-score in the family of the 
discriminant analysis. Z-Score is an index used in discriminant analysis to determine 
the probability of business failure through a statistical technique. It was developed 
by analyzing the balance sheet data of 33 failed companies and 33 solid companies 
with a degree of accuracy of 95% [1]. The discriminating function used by the Z- 
Score is different based on the type of company considered, for example, for listed 
companies: 


Z = (1,2* X,;)+ (1,4 * Xo.) + (3,3 * X3)+ (0,6 * X4) + (0.99 * X5) 
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where certain coefficients and discriminating variables are considered: 


Discriminating coefficients: Discriminating variables: 

a, = 1,2 X 1 = Net working capital/Total assets 
a2=1,4 X2 = Unadistributed profit/Total assets 
a3 = 3,3 X3 = EBIT/Total activity 

a4 = 0,6 X4 = Market value of equity/total debts 
as = 0,99 X5 = Revenue/Total assets 


The Z-score obtained is compared with the cut-off point which is the discrimi- 
nating value between companies in danger of insolvency and healthcare companies. 
In reality there is a threshold of elasticity, within which it is not possible to give a 
reasonable judgment. The «gray zone» represents an area of uncertainty that does 
not allow us to formulate a truthful judgment. 

In this sense for the case of Altman taken as an example the cut-off zone is: 


Area of insolvency Zone of uncertainty Solvency 


Jb, Sb Z' <Z cut off Z' > Z cut off 


Z’ cut off = 2,675 


However, the discriminating function built by Altman is not applicable to the case 
study object of our analysis as there are substantial differences between the budgets 
of private companies compared to public ones. In particular, the public budget indi- 
cates the assets and liabilities that refer to a specific period of time and is expressed 
in cash inflows and outflows and uses the cash accounting principle; instead, the 
private balance sheet summarizes the financial, equity and economic performance of 
the companies and uses the accrual accounting principle [25]. In light of these consid- 
erations it was necessary to construct ad hoc variables and coefficients. Since there 
is no unanimously shared and possibly tested alternative reference theoretical model 
in the literature, the implementation of the Altman model in its original explanatory 
form has flanked that of a model expression of other independent variables, appropri- 
ately selected taking into account their initial normalization, of the multicollinearity 
level highlighted among the variables, the degree of statistical significance of the 
parameters, as well as the consistency of the sign in terms of economic causality 
[28, 30]. 

Based on the information available, 12 indicators have been constructed on the 
basis of the information available on the balance sheet and balance sheet, on liquidity, 
on productivity, on profitability and on profitability, on the incidence of financial and 
extraordinary management. 

These indicators were selected in the first place trying to prefer those for which 
a normalization was possible in order not to alter the actual weight of the individual 
information at the time of the subsequent data processing and, secondly, excluding 
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the indices that already exist and which their construction appeared clearly combined 
in a linear fashion. Favouring the information of a corporate nature with regard to 
the greater or lesser probability of default, the values assumed by the indicators 
have been normalized while maintaining the comparison between balance sheet data 
instead of proceeding statistically by relating them to the maximum within the value 
vector of the index obtained for each statistical unit. 

The normalization was therefore carried out first of all by identifying the maximum 
value index for each year examined, therefore the value of each individual index was 
compared to the maximum one, thus obtaining the normalized indices. After having 
calculated and normalized for each region all the indicators calculable in the light of 
the data available after the balance sheet reclassifications, based on statistical tests 
and our assessments, we have identified a reduced subset formed by 5 variables on 
which to estimate the model. 

Regarding the construction of discriminating coefficients, a: 


a= (X,-—X)s"! (1) 


where is it: 


X= vector of the averages of the regions not in return plan; 


X2 = vector of the averages of the regions in the return plane; 
S~! = inverse matrix of variance and covariance of the sample. 


By identified both the discriminant variables X and the discriminating coefficients 
a it will be possible to reconstruct the discriminant function to be used in the case 
study: 


Z = (ay * X1) + (ay * Xz) + (a3 * X3) + (a4 * X4) + (5 * Xs) (2) 


At this point it is necessary to determine a cut-off point that allows us to separate 
insolvent companies from healthcare ones. We can take as the cut-off point the value 
exactly halfway between the two averages of the score calculated for the two groups 


(healthcare and insolvent). Therefore, if we denote by Z lb rae e a respectively, the 
average score of group 1, the average score of group 2 and the threshold value, we 
will have: 


- Z i+ Z 
Z.= —.— 3 
5 (3) 
The cut-off point is positioned in the so-called grey area, i.e. an interval in which 
for the same scores we find both healthcare and insolvent companies. Obviously, the 
smaller this area will be, the greater the effectiveness of the model will be, since the 
probability of making mistakes will be minimal. 
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4 Case Study 


The sample we examined was based on the typical characterization of the original 
Altman model [1] which is based on the parity of numbers (50-50) of the subsamples; 
this choice is dictated by various reasons, including the need to reduce the variance 
of coefficient estimates, the need to obtain a more efficient identification of the 
differences between the groups and the containment of costs related to the selection 
of companies whose data are processed from the model. 

The reference year of our analysis starts from 2012 as Legislative Decree 118/2011 
introduces a new regional organizational structure, called «Centralized Healthcare 
Management», specifically established in relation to the management of regional 
healthcare expenditure. The provisions introduced are aimed at ensuring uniformity 
of the healthcare accounts of the Regions and healthcare agencies to ensure trans- 
parency through the immediate comparison between the revenues and expenses of 
the sector and the verification of the additional resources available to the Regions 
for financing the healthcare service. Given the difficulty of finding at least three 
years of the balance sheet data relating to centralized healthcare management of the 
regions in the Return Plan and not, our investigation forced the analysis of 3 regions 
in the recovery plan (Campania, Molise and Sicily) and three regions not on the same 
level (Marche, Puglia and Umbria). The choice of the regions not on the same level 
was made taking into account the population (ab) present in the area and its density 
(ab/km7) assuming that the population benefits from the healthcare services in the 
same way. The data for this analysis were extracted in the official sites of the regions 
in the section dedicated to the budgets related to centralized healthcare management. 

The analysis started with the reclassification of the financial statements of the 
aforementioned regions: the balance sheet was reclassified according to the “financial 
criterion” and the income statement was reclassified as added value. The reference 
years, based on the goodness of the available data, are 2014, 2015 and 2016. The 
analysis was therefore directed towards the calculation of the following indices to 
which a normalization was performed: 


Financial structure indexes and financial solidity indexes; 

Liquidity indexes; 

Productivity indexes; 

Profitability and profitability indexes; 

Incidence of financial indexes and extraordinary management indexes; 


The discriminant variables X identified for our analysis case are: 


X\= Cash fund + active residuals / Total healthcare expenditure 

X2= Constrained surplus / Total healthcare expenditure; 

X3=) incoming (Tit. I II IIT)/ Total healthcare expenditure; 

X4= Total Revenues / Total healthcare expenditure + residual liabilities; 
Xs5= Total SSR revenues / Total healthcare expenditure. 
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Table 1 Discriminating 2014 2015 2016 
coefficients 

a 0,49 0,69 0,53 

a2 1 0 0 

a3 0 1 1 

a4 0,47 0,55 0,49 

as 0,47 0,48 0,50 


dis 


From the application of the formula (1) we have identified the following 
criminating coefficients, reported by year: 
The discriminating coefficients, shown in Table 1, have been normalized by 


first subtracting the minimum from the maximum value and then dividing by the 
maximum value so that the values were between 0 and 1. 


At this point it was possible to construct the discriminant function (2): 


Zo014 = (0,49 * X1) + (1 * X2) + (0 * X3) + (0, 47 * X4) + (0, 47 * Xs) 
Z2015 = (0,69 * X1) + (O*« Xo) + CL * X3) + (0,55 * X4) + (0, 48 * X5) 
Z2016 = (0,53 * Xi) + (O* X2) + (1 * X3) + (0, 49 * X4) + (0, 50 * Xs) 


To this end, the formula (3) cut-off zone was defined (Table 2): 
At this point it is possible to present the results obtained; for each single region we 


have calculated the z-score as a linear combination of the initial indexes that gives us 


an 


Table 2 Cut off point 


indication of the level of default risk of the regional healthcare system (Table 3): 


Z’ cut off 

2014 0,25 
2015 0,33 
2016 0,47 

Tan Setar 2014 2015 2016 
Molise 0,01 0,34 0,67 
Sicily 0,14 0,10 0,53 
Campania 0,59 0,67 0,79 
Marche 0,35 0,32 0,12 
Puglia 0,27 0,47 0,54 
Umbria 0,16 0,10 0,19 
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Table 4 Cut off zone 2014 2015 2016 
Molise Default Risk Grey Zone Solvency 
Sicily Default Risk Default risk Grey Zone 
Campania Solvency Solvency Solvency 
Marche Solvency Grey Zone Default risk 
Puglia Grey Zone Solvency Grey Zone 
Umbria Default Risk Default risk Default risk 


Comparing the Z-scores of each region with the cut off point for each year 
considered, it results that: 

In the Table 4 the situations of Solvency, Gray zone and Default risk are reported. 
The Gray zone is represented by those values very close to the cut off point. The 
further away the values from the cut off point the more they are indicative of the 
specific situation. In particular, it is possible to note that Sicily, from 2014 to 2016 
still finds itself in a situation of default risk or gray zone. Molise from 2014 to 2015 
has improved its situation and seems to be moving towards a situation that leads it 
to improve its situation. Interesting is the case of Campania that in 2014 was still 
in the recovery plan and in light of the results obtained it seems to be maintaining 
a Solvency situation. With regard to healthy regions, the analysis carried out shows 
critical issues also for these regions; in particular, Umbria shows the risk of default 
in the three years analyzed. 


5 Conclusion 


The risk management activity has always been compared with the risk of default. 
Academic research is unanimous in highlighting the need to build, from the balance 
sheet items, a set of short or long-term performance indicators and has therefore 
produced over the years a multitude of indices with different cognitive objectives and 
detailed. However, the plurality of indicators available to those who must analyze 
the financial statements and, in particular, estimate the risk of default, does not 
correspond to an equally vast set of observable phenomena. With specific reference 
to the public sector and in particular in our case to the healthcare sector, it is clear 
that many of the balance sheet ratios used for private balance sheets do not capture 
the critical issues related to the reference sector. Setting ourselves the ambitious 
goal of defining a rigorous statistical model for assessing the risk of default in the 
public sector with a focus on the healthcare sector, the analysis of the predictive 
capacity of the financial statements was carried out by identifying and defining ad 
hoc indexes capable of measuring more accurately the state of healthcare of a public 
administration. The models, in fact, need adequate customizations and refinements 
capable of grasping the specificities of the sector under study. In this sense, with 
the due care of the case under study, we defined ad hoc discriminating variables and 
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coefficients to be able to apply the Altman Z-score model to the PA. The work presents 
several innovative features: the adopted analysis method represents an interesting and 
appropriate evolution of the traditional discriminating analysis models used in the 
public sector; particular attention to the analysis of financial statement indices in 
the public sector led to the reduction of the basket of variables considered in the 
models without affecting the forecast performances; the results of the estimated 
coefficients allow the quantification of the effects of the variations of the indices on 
the measure of the probability of default. The results of our elaborations show a high 
predictive capacity of the model in discriminating healthy companies from those in 
bankruptcy. However, by means of the validation sample study, it is found that some 
healthy companies are assigned a high probability of failure on the basis of the model. 
Furthermore, each statistical method has a limit due to the quality of the data. In this 
context, this means having reliable and timely information to predict any critical 
issues both at the level of individual regions and specific healthcare systems and 
economic systems. This work appears to be a pioneer in this field of application, in 
this sense we believe that the model can certainly be implemented by trying to apply 
further statistical models beyond the Z-score to verify the goodness of the results 
obtained. In light of this, we hypothesize, on the basis of the results we will obtain, 
to define a support model for the decision-making processes of regional governance 
within the healthcare system, for this reason we would like to use the multi-criteria 
methods (MCDM) specifically to Saaty’s AHP method [13, 16, 17, 19, 33]. 
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Abstract Every time some judges are asked to express their preferences on a set 
of objects we deal with ranking data. Nowadays, the analysis of such data arise in 
many scientific fields of science, such as computer sciences, social sciences. Political 
sciences, medical sciences, just to cite a few. For this reason, the interest in rank 
aggregation problem is growing up in the research community. The main issue in 
this framework is to find a ranking that best represents the synthesis of a group of 
judges. In this chapter we propose to use, for the first time, a memetic algorithm in 
order to address this complex problem. The proposed memetic algorithm extends 
genetic algorithms with hill climbing search. As shown in an experimental session 
involving well-known datasets, the proposed algorithm outperforms the evolutionary 
state-of-the-art approaches. 
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1 Introduction 


Nowadays, preference rankings have become widespread in several fields of analysis, 
such as political sciences, computer sciences, marketing, etc. Preference data are 
generally expressed by either “ratings” data or “rank” (or rankings, or preference 
rankings) data. Both are data expressing individual’s preference over a set of available 
alternatives. A rating scale is a method that requires the rater to assign a value, 
sometimes numeric, to the rated object, as a measure of some rated attribute. A 
ranking is a relationship between a set of objects (items, sentences) such that, for any 
two objects, the first is either ‘ranked ahead than’, ‘ranked lower than’ or ‘ranked 
equal to’ the second. Within the preference rankings framework [1] these data can be 
represented through rank vectors (or simply rankings), or order vectors (or simply 
orderings). Suppose there are n items to be ranked by m judges: a ranking lists the 
ranks given to the objects, in which | means the best and n means the worst. An 
ordering, instead, lists the objects in order from the best to the worst. Suppose we 
have four objects to be ranked, and suppose that these objects are labeled A, B, C 
and D. Then, the ordering < BADC > corresponds to the ranking (2143). When the 
subject assigns the integer values from | to 7 to all the n items we have a complete (or 
full) ranking. When instead a judge fails to distinguish between two or more items 
and assigns to them the same integer number, we deal with tied (or weak) rankings. 
They are also known as bucket orders [2, 3]. We have a partial ranking [1, 4, 5] when 
judges are asked to rank a subset of the entire set of objects (e.g.. pick the k most 
favorite items out of a set of n, k < n). Moreover, there are incomplete rankings if 
the rank for some objects is not assigned by some judges. It is worth noting that the 
case of partial rankings is included in the case of incomplete ranking. Throughout 
the chapter we use the terms rankings and orderings interchangeably since we give 
both the meaning of rank vectors. 

Most of the common statistical techniques, methods and models can be used 
for analyzing preference data. Both supervised and unsupervised statistical learning 
methods can be used. Ratings and rankings cannot be treated exactly in the same way: 
there exist methodologies to analyze rankings and methodologies to analyze ratings. 
To the formers belong two type of methods. One of them is based on badness-of-fit 
functions describing the multidimensional structure of rank data. In this category are 
included statistical methods as multidimensional scaling [6], unfolding methods [7, 
8, 10], preference mapping methods [12, 13], principal component analysis [11, 20, 
21]. The second ones are based on probabilistic models [1, 4], modeling either the 
ranking process or the population of rankers. These methods include distance-based 
models [14, 15, 18, 19], multistage models [16], Thurstonian models [17]. 

Methods that model the population of rankers assume heterogeneity among the 
judges with the goal to identify homogeneous sub-populations, mixtures of Bradley- 
Terry-Luce models, mixtures of distance-based models [22-24], sorting insertion 
rank models, K-median cluster component analysis [25]. 

Among the proposals that include covariates, the majority of them is based on 
generalized linear models [26-29]; recursive partitioning methods [30-34]. 
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An important role in the analysis of rank data is searching a ranking that is rep- 
resentative of a population (or of sub-populations) of judges regardless of whether 
homogeneity or heterogeneity among the judges is assumed or whether a proba- 
bilistic model is used. The identification of that ranking (or those rankings) that best 
represent the synthesis of a group of judges is the detection of the so-called consensus 
or median ranking [35]. The median ranking is defined as the ranking (or the rank- 
ings) that minimizes the sum of distances between itself and all input rankings. Since 
the searching space has the same dimension of the full permutations of the objects 
to be ranked, finding the median ranking is a Non-deterministic Polynomial-time 
(NP) hard problem. In addition, if there are either incomplete or partial rankings, the 
problem became more complex because of the dimension of the universe of rankings 
increases. 

Due to the complexity of the median ranking problem, in the last years, evolu- 
tionary algorithms are emerging as a suitable methodology to address this problem. 
In particular, recently, an approach base on Differential Evolution algorithm, called 
DECoR, has been proposed in [41]. This approach allows very accurate solutions 
with a large number of objects in reasonable computing time: in a moment till 50 
objects, in a few seconds till 100 objects, in a few minutes for more than 200 objects. 
These results make it the evolutionary state-of-the-art algorithm for rank aggregation 
problem. However, although evolutionary algorithms have been applied in several 
domains showing good results, they can be characterized by a slow convergence. In 
order to overcome this drawback, in the last years, a new kind of algorithms, named 
Memetic Algorithms (MAs) [54], is emerging. Thanks to the marriage between global 
and local search, MAs succeed to find high quality solution faster than conventional 
evolutionary algorithms. 

Starting from these considerations, the chapter proposes to apply, for the first time, 
a MA to solve the ranking aggregation problem. The proposed MA is a combination 
between genetic algorithms and the stochastic version of the hill climbing search. 
As shown by a set of experiments performed by exploiting well-known real datasets, 
our proposal outperforms the evolutionary state-of-the-art algorithm. 

The paper is organized as follows. Section 2 introduces the rank aggregation prob- 
lem. Section3 is dedicated to describe the proposed MA for the rank aggregation 
problem. Before concluding in Sects.5 and 4 is devoted to show the results of the 
carried out experimental session. 


2 Rank Aggregation Problem 


One of the main problems when working with rankings is the detection of that 
ordering that best represents and synthesizes the opinions of a sample of judges when 
they have to order a set of items. Depending on the reference framework, finding a 
ranking which most closely agrees with the set of the input rankings is known with 
different names. Indeed, in the field of computer science [39], the consensus ranking 
problem is best known as rank aggregation problem. Given a series of judgments 
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about a set of n objects by a group of m judges, finding the ranking that best represents 
the consensus opinion is a very old problem. In the framework of ad-hoc methods 
to obtain a consensus, [37, 38] proposed the most famous non-elimination methods 
for the voting methods. The more recent literature is moving into the Kemeny’s 
axiomatic framework [35, 36]. 

Let A and B be two rankings and let d(A; B) be a distance between them, the 
Kemeny—Snell axioms are: 


1. d(A, B) must be a metric. 

2. d(A, B) = d(A’, B’), where A’ and B’ result from A and B respectively by the 
same permutation of the alternatives. 

3. Iftwo rankings A and B agree except for a set S of k elements, which is a segment 
of both, then d(A, B) may be computed as if these k objects were the only objects 
being ranked. 

4. The minimum positive distance is 1. 


Suppose we have n objects to be ranked. In defining his distance, [35] made use of 
the same matrix representation of rankings as was used earlier by [40]. 

Let a;;(b;;) be the generic element of the n x nn squared preference matrix A(B) 
called score matrix, withi, j € 1,...,in whicha,;; = 1 ifthe i-th object is preferred 
to the j-th object, a;; = —1 if the j-th object is preferred to the i-th object, a;; = 0 if 
the objects are tied. 

The distance is defined as: 


d(A, B) = sla — ijl. 
ij 


The Kemeny distance is the unique measure satisfying these axioms, working with 
any kind of ranking (full, partial, incomplete, tied). As highlighted in [4, 33], it 
is naturally defined on the extended permutation polytope. Except for the Kendall 
distance, any other (widely used) distance (e.g.., Spearman’s Footrule, Spearman p, 
Hamming, Cayley, Ulam) either is not defined in the polytope (do not preserve its 
geodesic nature) or assumes strange behaviors in dealing with tied rankings. 

Let X,,..., Xm be a set of m rankings of n objects. [36] defined the median 
ranking as that ranking: 


m 


Y = arg min 24K Y). 


Recently, [42] proposed a new rank correlation coefficient, named tau extension, in 


this way: 
i ja Uji 
Tx (A, B) = —~—__.,, 
n(n — 1) 
where a;; and b;;, i, j = 1,...,n are the elements of the score matrices of the 


rankings A and B slightly modified with respect to the original Kendall’s formulation 
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(a;; = 1 if the i-th object is in a tie with the j-th object). Emond and Mason proved 
that ty(A, b) = 1— 2a4 and formulated the original Kemeny’s problem in this 
way: 


m n (k) n 
Y¥ = arg max wei jets J) arg max y Cij ij 
= = t L 
Yez" n(n — 1) yey We Yez io 


(1) 


i,j=l 


where w, is a weight associated to the k-th ranking (i.e., the frequency associated 


to each of the k rankings), xy x”) is the set of m modified score matrices 


GG eee 
associated to m rankings, cj; = )-7L, wax, represents the elements of a matrix 
C called combined input matrix, and y;; depicts the elements of the score matrix 
associated to the ranking Y. 

The combined input matrix contains all the information that a set of m judges can 
‘transfer’ in the rank aggregation problem. 

Emond and Mason [42] conceived a branch-and-bound algorithm to maximize 
Eq. | by defining an upper limit on the value of that dot product. This limit, consid- 
ering that the score matrix y;; consists only of the values 1, 0 and —1, is given by 
the sum of the absolute values of the elements of C: 


n n 


V= Yoo lciyl. 


i=l j=l 


D’ Ambrosio et al. [43] proposed a variation of the proposals by [42] and by [5] for 
partial rankings. 


3 A Memetic Algorithm for Consensus Ranking 


This chapter aims at presenting a novel technique to perform consensus ranking based 
on a hybrid evolutionary approach, named Memetic Algorithms (MAs). Memetic 
Algorithms (MAs) are population-based search methods which combine evolution- 
ary algorithms (EAs) with local search (LS) methods. In detail, EAs are a set of 
meta-heuristics inspired by natural processes aimed to solve complex optimization 
problems. Among the most popular EAs there are surely Genetic Algorithms (GAs). 
In particular, GAs are based on Darwinian principle of the natural selection which 
leads to the survival of the only fittest individuals [46]. They address optimization 
problems by manipulating a population of potential solutions and by reproducing the 
mechanisms of evolutionary processes in nature. Specifically, GA starts from a pop- 
ulation of randomly generated individuals and consists in successive generations. In 
each generation, as in nature, a selection process provides the mechanism for select- 
ing better solutions to survive. Each solution is evaluated by means a fitness function 
that reflects how good it is, compared with other solutions in the population. The 
higher is the fitness value of an individual and higher are its chances of surviving. 
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Apply 
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Fig. 1 The general structure of a memetic algorithm 


Recombination of genetic material in genetic algorithms is simulated through two 
operators: crossover that exchanges portions between two randomly selected chro- 
mosomes and mutation that causes random alteration of the chromosome genes. The 
evolution of the algorithm ends when a predefined termination criterion is achieved. 

In spite of GAs have been successfully applied in several domains, they generally 
suffer from slow convergence to locate a precise enough solution because of their 
failure to exploit local information [49]. Unfortunately, this often limits the practi- 
cality of EAs on many large-scale real world problems where the computational time 
is a crucial consideration. MAs have arisen as a response to the problems showed by 
EAs [45]. Indeed, thanks to their ability to extend conventional evolutionary algo- 
rithms with a stage of local search optimisation, MAs usually succeed to find higher 
quality solution with respect to traditional evolutionary algorithms. 

In general, the structure of MAs is shown in Fig. 1. 

During the design of a MA, it is necessary to address some issues [50] such as 
the following: 


e Local search frequency: How often should local search be applied within the 
evolutionary cycle? 

e Individual selection mechanism: Which individuals in the population should be 
improved by local search? 

e Local search intensity: How much computational effort should be allocated to each 
local search? 

e Local search method: Which local search procedure should be used? 


Therefore, during our design of a MA for consensus ranking, we will take into account 
them carefully. 

Broadly, in order to apply MAs for solving our problem, it is necessary to define 
some points: 


e the individual structure aimed at representing the solution of our problem, i.e, a 
consensus ranking; 
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e the employed fitness function which allows the evaluation of the considered solu- 
tions; 

e several issues about the integrated local search process; 

e the termination criteria. 


Hereafter, each of these points will be deepened and discussed as a necessary and 
preliminary step for introducing our memetic approach for consensus ranking. 


3.1 Solution Encoding 


An individual represents a candidate consensus ranking synthesizing the opinions on 
n objects by a sample of m judges. Hence, an individual is encoded as integer vector 
characterized by a cell for each object (i.e., the dimension of the individual is 7). 
Each cell contains an integer representing the rank to associate to the corresponding 
object. In other words, if the ith vector cell contains the integer j, the ith object is 
characterized by the rank /. 


3.2. Fitness Function 


The fitness function, denoted as ®, is represented by the sum of the Kemeny distance 
between the candidate consensus and all the rankings in the data matrix as already 
exploited in [41]. Formally: 


i n(n — 1) 
P= So we i as (1 — tx(cij, vis), 
k=1 


where Tx (c;;, yjj) 18 the averaged ty rank correlation coefficient associated to the 
candidate consensus ranking Y and c;; represents the elements of the combined input 
matrix. Therefore, the fitness function is to be minimized. 


3.3 Genetic Operators 


The proposed MA performs the global search process by applying the following 
traditional genetic operators: single-point crossover and uniform mutation. In gen- 
eral, the crossover operator takes two individuals called parents and produces two 
new individuals, called offsprings, by exchanging the genes of the parents. In litera- 
ture, there exist different kinds of crossover. In this work, we exploit the traditional 
single-point version. More in detail, the crossover operator randomly selects two 
chromosomes from the population and “mates” them by randomly picking a gene 
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and then swapping that gene and all subsequent genes between the two chromosomes. 
The crossover operator is applied with a certain rate r,. Therefore, it is performed 
1/r, as many times as there are chromosomes in the population. Instead, the mutation 
operator runs through the genes in each of the chromosome in the population and 
mutates them in statistical accordance to the given mutation rate p,,. Therefore, the 
number of mutation operations is not deterministic. The application of these genetic 
operators produce a set of new individuals. They are added to the population and 
evaluated by means of the fitness function. After these conventional genetic steps, 
our algorithm applies a genetic selection operator, denoted as deterministic sampling, 
which selects the best individuals to generate the new population. 


3.4 Local Search Phase 


In general, the local search strategies accomplish an iterative search to try an opti- 
mum solution in the neighborhood of a candidate. Starting from an initial candidate 
solution, these methods use Jocal information to select in the neighborhood the 
next current solution. In each iteration, typically, the current candidate is improved 
through a mutation operator. The process ends when the solution can not be better 
or after a prefixed number of iterations. In particular, the proposed MA performs 
a local search phase by exploited the so-called Hill Climbing algorithm. This is a 
local search iterative method which attempts, during each iteration, to find a better 
solution by incrementally changing a single component of the current one. There 
are several variants of Hill Climbing search depending on how the next solution is 
tried. The proposed MA uses the stochastic version of Hill Climbing which selects 
a neighbor at random. 


3.5 Termination Criteria 


Compliant with other evolutionary algorithms, MAs stop their evolution process 
according to a termination criterion. In literature, there are several termination criteria 
such as the number of generations and the number of fitness functions. Our MA can be 
used by applying any termination criterion, however, in the experimental session, we 
will use as termination criterion the achieving of a maximum number of consecutive 
generations without improvements in the fitness function. This limit, denoted as L, 
can be set arbitrary. 
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4 Experiments and Results 


This section is devoted to show the benefits provided by our MA with respect to 
existing approaches in literature. In order to achieve this aim, we perform a set 
of experiments involving well-known datasets. Hereafter, the description of these 
datasets, the setting configuration of the experiments and the results are given. 


4.1 Datasets 


The experimental session involves the exploitation of well-known real datasets. In 
particular, we use the USA rank data set (from now on denoted as USAranks), which 
is freely available from the internal repository of the R package ConsRank [47], is a 
random subset of the rankings collected by O’ Leary Morgan and Morgon [51] on the 
50 American States. The number of items is equal to 50, and the number of rankings 
is equal to 104. Along with USAranks, we use two datasets coming from the PrefLib 
repository [48], i.e, a reference library of preference data containing over 3,000 data 
sets freely available. We decided to show the ED-00010.001.toc data set (from now 
on denoted as ED10) which contains 54 items (the racers of the 1961 Formula | 
championship) and 8 judges (the eight races) and the ED-00015-001.soc (from now 
on denoted as ED15) data set which consists of 240 items (240 cities) and 4 judges. 


4.2 Experimental Settings 


The proposed MA has been implemented in Java using the library JMeme [52]. 
Our method is compared with the evolutionary state-of-the-art approach, denoted as 
DECoR, reported in [41] and implemented in R.' The configuration parameters of 
the compared algorithms are reported in Table |. The parameters of our approach 
are set experimentally, whereas, the setting used for DECoR is that specified by its 
authors as we assume that their choices are optimally made. 

To perform a fair comparison, both the algorithms have the same stopping crite- 
rion, i.e., the maximum number L of consecutive generations without improvements 
in the fitness function. Compliant with DECoR authors, the value of the parameter 
L is set to 150. As for the metric used to evaluate the performance of compared 
approaches, we use two measures: the accuracy of the obtained consensus ranking 
computed in terms of ty and the time consumption. However, it is worth noting that, 
as for the comparison in terms of time, it could be influenced by the fact that the 
compared approaches are implemented by using two different program languages. 


‘https://cran.r-project.org/web/packages/ConsRank/index.html. 
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Table 1 Parameters used by compared methods for ranking aggregation problem 


Algorithm Parameter 

MA Population size = 15, L = 150, re = 0.8, pm = 0.02, local intensity /; = 10%, 
local frequency / ¢ = 50 iterations 

DECoR Population size = 15, L = 150, F = 0.4, CR = 0.8 


Experiments for both compared approaches have been made by using a notebook 
equipped with Intel Core i7-7700HQ 2.80 GHz and 16GB of RAM without a parallel 
processing. 


4.3 Results 


This section is devoted to show the comparison between our MA and DECoR. Table 
3 shows the results of the comparison in terms of ty. The reported values are the 
minimum, the maximum, the median and the standard deviation on 25 runs. Table 
4, instead, shows the results in terms of the time consumption (in seconds) averaged 
on the 25 runs. As an example, Table 2 shows the best solutions obtained by the 
compared approaches for the dataset USAranks. 

By analysing results, our MA outperforms DECoR for all datasets in terms of Tx, 
whereas, it is faster than DECoR for 2 out of 3 datasets (USAranks and ED10). 


5 Conclusions 


This pchapter presents, for the first time, the application of MAs to solve the rank 
aggregation problem. After a set of experiments involving well-known datasets, 
our proposal results in better performance than DECoR, the evolutionary state-of- 
the-art algorithm for the rank aggregation problem. In future, we will introduce 
an improved version of the proposed approach based on the competent design of 
memetic optimisation [53] and we will perform a more systematic experimental 
session involving a statistical analysis executed on a large collection of heterogeneous 
datasets. 
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Table 2. Best solutions for the dataset US Aranks 
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MA DECoR 

1 California 26 Connecticut 1 California 26 Colorado 

2 New. York 27 Minnesota 2 New York 27 Connecticut 
3 Florida 28 South Carolina 3 Florida 28 Minnesota 

4 Maryland 28 Oregon 4 Maryland 29 Alabama 

5 Louisiana 29 Oklahoma 5 Louisiana 30 South Carolina 
6 New Mexico 30 Mississipi 6 New Mexico 32 Oklahoma 

7 Texas 31 Arkansas 7 Delaware 33 Mississipi 

8 Illinois 32 Hawaii 8 Texas 34 Arkansas 

9 Delaware 33 Kentucky 9 Illinois 35 Hawaii 

10 Pennsylvania 34 Kansas 10 Pennsylvania 36 Kentucky 

11 Michigan 34 Rhode Island 11 Michigan 37 Kansas 

12 Georgia 35 Utah 12 Georgia 37 Rhode Island 
13 North.Carolina 36 Nebraska 13 North Carolina 38 Utah 

14 New Jersey 36 Iowa 14 New Jersey 39 Iowa 

15 Massachusetts 37 Wyoming 15 Massachusetts 39 Nebraska 

15 Washington 38 West Virginia 16 Washington 40 Wyoming 

16 Virginia 39 Idaho 17 Ohio 41 West Virginia 
17 Tennessee 40 Maine 18 Virginia 42 Idaho 

18 Nevada 40 New Hampshire 19 Tennessee 43 Maine 

19 Arizona 41 Montana 20 Nevada 44 Montana 

20 Missouri 42 Vermont 21 Arizona 45 New Hampshire 
21 Wisconsin 42 South Dakota 22 Missouri 46 South Dakota 
22 Alabama 43 North Dakota 23 Indiana 47 Vermont 

23 Indiana 24 Alaska 48 North Dakota 
24 Alaska 25 Wisconsin 

25 Colorado 


Table 3 Results of the comparison between MA and DECOoR in terms of tx 


USAranks ED10 ED15 
MA DECoR MA DECoR MA DECoR 
Max 0.3157 0.2977 0.6254 0.5226 0.7472 0.7421 
Median 0.3152 0.2975 0.6252 0.4812 0.7470 0.7421 
Min 0.3141 0.2928 0.5990 0.4655 0.7464 0.7421 
Std 0.0004 0.0010 0.0056 0.0124 0.0002 1.13E-16 
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Table 4 Results of the comparison between MA and DECOoR in terms of time consumption (in 


seconds) 
MA DECoR 

USAranks 1.4 26 

ED10 1.6 29.7 

ED15 115.8 22.6 
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Mathematical Optimization ®) 
and Application of Nonlinear cre 
Programming 


Alena Vagaska and Miroslav Gombar 


Abstract This chapter deals with mathematical optimization, focusing on the 
special role of convex optimization and nonlinear optimization for solving opti- 
mization problems. Optimization problems are highly significant at proposals of 
technological processes as well as projecting engineering objects, their realization 
and during their operation. In term of surface treatment of metals the time of the 
process duration is one of the most important parameters that determine the effi- 
ciency of the entire process. If we manage to minimize the time needed to create the 
layer with requested thickness at setting functioning factors, economic profit can be 
maximized at securing requested quality. To solve optimization problem nonlinear 
programming in Matlab was used. Based on the DOE methodology according to 
the central composite design, the set of experiments containing 40 runs has been 
performed in order to identify and analyse factors affecting the process of elec- 
trolytic alkaline zinc plating at a current density of 0.5[A . dm~’]. The influence of 
seven input factors on the final thickness of formed zinc layer has been investigated 
and the mathematical-statistical model predicting the thickness of the formed layer 
is presented in the chapter. In order to save time, as the possibility of increasing the 
efficiency of the technological process, nonlinear programming was used to optimize 
the zincing process and the process of anodic oxidation, both belong to the surface 
treatment processes. In this chapter, we discuss selected optimization methods and 
mathematical models. 
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1 Introduction 


Optimization is always actual for many scientists, engineering students, industry 
practitioners and professionals, although it is a very old theme [1, 2]. Optimization is 
a technique, which is used everywhere, from engineering design (civil engineering, 
chemical, mechanical, electronic and aerospace engineering) to business manage- 
ment, from fashion technology to mass communication, from military to finance, 
from pharmacology to medicine, biology, meteorology, from automatic control to 
scheduling or human decision making, and in many other areas and of course in 
our daily activities.[3—7]. We often do not realize that every day we solve various 
optimization tasks. For example, in the morning, we may wonder when we should 
get up so that we can sleep as long as possible without to be late to work. Even when 
we plan our holidays, we want to maximize our enjoyment with least cost (ideally 
free). We always intend to maximize something (profit, seller’s price, quality, free 
time, etc.) or minimize something (cost, consumption, price from the buyer’s point of 
view, time needed to perform activities, waiting time, weight ...) while maintaining 
certain conditions. 

The wide use of optimization is based directly on its nature, which is reflected in 
the simplest definition of optimization. Some define it as “the art of making things 
the best”. Many people do not like that definition as it may not be reasonable, or 
even possible, to do something in the very best possible way. In practice, doing some- 
thing as well as possible within practical constraints is very desirable. Optimization 
provides us with the means to make things happen in the best possible practical 
way. Without optimization, we accomplish this by using experience, intuition, and 
just plain luck. With optimization, we do it in a systematic way, where we use the 
power of a computer to examine more possibilities than any human being could 
ever attempt. Furthermore, the optimization approach makes sure that the search of 
optimal solutions is done as efficiently as possible [6]. 

In fact, we are constantly searching for the optimal solutions to every problem 
we meet, though we are not necessarily able to find such solutions. The complexity 
of the problem of interest makes it impossible to search every possible solution or 
combination, the aim is to find good, feasible solutions in an acceptable timescale. 
Sometime there is no guarantee that the best solutions can be found, and we even do 
not know whether an algorithm will work and why it does work. 

Optimization can be defined as the process of finding the best solution to a problem 
in a certain sense and under certain conditions [1]. What we try to minimize or 
maximize is simply known as an objective function. 

Optimization became an independent area of mathematics in 1940, when Dantzig 
presented the so-called simplex algorithm for linear programming. 

The development of nonlinear programming became great after presentation of 
conjugate gradient methods and quasi-Newton methods in the 1950s. 

In order to compute a solution of given optimization problem it is necessary to use 
a solution method for a certain class of optimization problems. A solution method 
is an algorithm that computes a solution of the problem with some given accuracy. 
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Since the 1940s, a large effort has gone into developing algorithms for solving various 
classes of optimization problems, analysing their properties, and developing good 
software implementations. The effectiveness of these algorithms, i.e., our ability to 
solve the optimization problem, varies considerably, and depends on factors such as 
the particular forms of the objective and constraint functions, how many variables 
and constraints there are, and special structure. 

To determine the optimal solution for objective functions, today there exist many 
modern optimization methods, which are able to solve a variety of optimization prob- 
lems. Now, they present the necessary tool for solving problems in diverse fields, 
many of them became very popular. Linear programming, simplex method, assign- 
ment model, transportation model, CPM, PERT, nonlinear programming, Simulated 
Annealing (SA), Genetic Algorithms (GA), and so on, are playing a vital role to deter- 
mine the optimal solution [4]. Organizations are implementing these techniques to 
maximize their profits and minimize their costs. 

Indeed, mathematical optimization has become an important tool in many areas. 
The list of applications is still steadily expanding. For most of these applications, 
mathematical optimization is used as an aid to a human decision maker, system 
designer, or system operator, who supervises the process, checks the results, and 
modifies the problem (or the solution approach) when necessary. This human decision 
maker also carries out any actions suggested by optimization problem, e.g., buying 
or selling assets to achieve the optimal portfolio. 

As this chapter is mainly about application of mathematical optimization in engi- 
neering practice, (our research was aimed to optimize the technological processes 
of surface treatment), in the next sections we describe some classes of optimization 
problems and selected optimization methods. We present the application of nonlinear 
programming to optimize zinc plating process and anodizing process by using the 
MATLAB software toolbox. We also discuss the influence of input parameters on 
the results. 


2 Optimization Process 


The aim of optimization is to provide tools for decision making to choose the best 
possible solution of specific tasks, which may be of different kind of nature (e.g. to 
establish values of production parameters—temperature, pressure, etc., in order to 
minimize production costs while adhering to technological procedures). A few steps 
are needed to make such qualified decision [8]. 


1. The first and the most important step in an optimization process is the construction 
of the appropriate model, mathematically describing the object or process which 
have to be decided, and this step can be the problem itself. For the decision- 
making process the object of our decision-making must have defined quantifiable 
parameters (input parameters that we can to influence and output parameters that 
we want to influence). We can say that the object of our interest is the system. 
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The model itself is a mathematical description of the system, which must include 
a quantitative criterion on the basis of which the success of parameter selection 
(profit, cost, time, energy, etc.) can be evaluated. The criterion value can be 
determined by measurement or calculation [1]. 


The success of solving the whole optimization task often depends on the creation 
of an adequate model. If the model is too simplified, the results may not reflect reality. 
On the other hand, if the model is too complex, solving the task becomes difficult or 
even impossible. It should be noted that the modeling process itself often also solves 
optimization tasks (e.g. in identifying input-output dependencies) [1—4, 8]. 

After the creating of the appropriate model, it is possible to formulate an 
optimization problem. 

Modeling the optimization problem, also called problem formulation, is a process 
that involves clear defining of the objective function, variables and constraints. In 
this context, variables and constraints could be of different types (e.g., continuous 
and discrete variables, equality and inequality constraints). Problem formulation also 
involves defining the upper and lower bounds of the variables [6]. 


2. Itis necessary to solve optimization problem by applying the appropriate algo- 
rithm. It is no need to emphasize that there does not exist a universal algorithm 
for solving the set of problems, but there is a set of algorithms suitable for 
individual types of problems. Which of the known and available algorithms is 
suitable to use, it is usually determined by the user. The choice of the appro- 
priate algorithm largely determines the success and computational complexity 
of the solution. Modeling the optimization problem is strongly correlated with the 
choice of optimization algorithms. In other words, the class of optimization algo- 
rithms available to solve a problem depends on how that problem is formulated. 
This relationship often drives researchers to make important approximations 
in their problem formulation (e.g., converting equality constraints to inequality 
constraints using a tolerance value) in order to leverage powerful algorithms that 
perform well in the absence of equality constraints. Each different class of opti- 
mization problem generally calls for a different type of solution approach, as 
well as a different class of software or algorithm [3, 6]. 

3. After obtaining a solution using one of the numerical methods, the result must 
be verified whether it is really a sought solution to the optimization problem. 
Optimization conditions are often used to verify the solution. It is also important 
to interpret the solution from the perspective of a specific task, which can lead 
to a modification (refinement) of the model. 


2.1 Mathematical Optimization 


At the beginning, it is necessary to define an objective function, which is an abstrac- 
tion of the problem of making the best possible choice from a set of candidate choices. 
The objective function, for example, might be a measure of the overall risk of the 
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portfolio return, a measure of profit, purity of material, time of duration of techno- 
logical process, a measure of misfit or prediction error between the observed data 
and the values predicted by the mode, etc. [1—3, 6]. 

The objective function depends on certain characteristics of the system, which are 
known as variables. The goal is to find the values of those variables, for which the 
objective function reaches its best value, which we call an extremum or an optimum. 
Those variables are chosen in such a way that they satisfy certain conditions, i.e. 
restrictions. 

A mathematical optimization problem, or just optimization problem (OP), has the 
form 


min{ f (x) : x € X}, where f : X > Rand X C R” (1) 


Here x = (x1,...X,) is the optimization variable of the problem, the function 
f :R" > Ris the objective function and X is called the search space or the choice 
set, and each element x is called a candidate solution or feasible solution. If the set 
X is empty, the optimization problem is unfeasible. 

If there is a such sequence x* € X, that f (x*) — oo fork — oo, we are discussed 
about unconstrained optimization. It is in the case, when X = R", ifX C R", then 
we are talk about constrained optimization [3, 9]. 

Sometimes, in the applications, the set of input parameters is bounded, i.e., the 
input parameters have values within the allowed space of input parameters (search 
space X). Then the optimization task is to choose n control variables x1, x2,..., Xn 
from the search space in a such way, that we can reach the minimum (maximum) of 
the objective function, so optimization problem has the form 


minimize fo(x1,x2,.., Xn) (2) 


subject to f;(x) < b,i=1,...,m 


Here the vector x = (x4, XQ, 000s 4s) is the optimization variable of the problem, 
the function fo : R’ — Ris the objective function, the functions f, : R” — R, are the 
(inequality) constraint functions fori = 1, ..., m, and the constants b;,, b2,--- , Dn 
are the limits, or bounds, for the constraints. A vector x* is called optimal, or a solution 
of the problem (2), if it has the smallest objective value among all vectors that satisfy 
the constraints: for any z with f|(z) < b1,--- , fm(Z) < Dm, we have fo(z) => fo(x*). 

We generally consider families or classes of optimization problems, characterized 
by particular form of the objective and constraint functions. As an important example, 
the optimization problem (2) is called a linear programming (LP) problem, if the 
objective and constraint functions fo, f,,, f2,-- , fm are linear, i.e., satisfy 


filax + By) = afi(x) + Bfi(y) (3) 
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for all x, y € R” andalla, BER. 

If the optimization problem is not linear, it is called a nonlinear programming or 
nonlinear optimization [3]. 

Linear programming and linear optimization mean the same thing. The former is 
often used for historical reasons. When either the objective function or any of the 
constraints is a nonlinear function of the variables, the problem is called a nonlinear 
programming (NP) problem (which means the same as a nonlinear optimization 
problem). Generally, LP problems are much easier to solve than NP problems, partic- 
ularly when the dimension of the problem is large. In addition, the more nonlinear 
problem, the more difficult it may be to optimize [6]. 

The special role in optimization has a class of optimization problems called convex 
optimization problems. A convex optimization problem is one in which the objective 
and constraint functions are convex, in which the objective and constraint functions 
are convex, which means they satisfy the inequality 


fi(ax + By) < afi(x) + Bf) (4) 


for all x, y € R” andalla, 8 € Rwitha+ B = 1,a > 0, B => 0. Comparing (3) 
and (2), we see that the convexity is more general than linearity: inequality replaces 
the more restrictive equality, and the inequality must hold only for certain values of 
a and f. Since any linear program is therefore a convex optimization problem, we 
can consider convex optimization to be a generalization of linear programming. 

There are, however, some important exceptions to the general rule, that most 
optimization problems are difficult to solve. For a few problem classes we have 
effective algorithms that can reliably solve even large problems, with hundreds or 
thousands of variables and constraints. Two important and well known examples are 
least-squares problems and linear programming. 

Important class of optimization problems is linear programming, in which the 
objective and all constraint functions are linear, so the choice set X is given by the 
set of linear equalities and inequalities: 

minimize c? x 
ax =b;,iel (5) 


subject to ! 
ee atx <b Ged 


where a;,a;,c € R",bj,b; € R,i € J, j € J are given and x € R”. Generally this 
optimization problem can be transformed to the optimization task with equalities by 
usage additional variables: 


minimize ec? x 
Ax=b (6) 
x>0 
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where A € R”*”, b € R” are given and x € R” is vector variable. The form (6) is 
called linear programming (LP) in standardization form. 

There is no simple analytical formula for the solution of a linear programming, 
but there are a variety of very effective methods for solving them, including simplex 
method, and the more recent interior-point methods [3, 10]. 

For generally nonlinear optimization problems, the modern stochastic optimiza- 
tion algorithms are used [9, 11]. 

As we deal with a given optimization problem, it is important to know what class 
(or type) of problem it represents. This is because each different class of optimization 
problem generally calls for a different type of solution approach, as well as a different 
class of software or algorithm. Optimization problems can be classified along seven 
major categories: 


— Linear versus Nonlinear. 

— Constrained versus Unconstrained. 

— Discrete versus Continuous. 

— Single versus Multiobjective. 

— Deterministic versus Nondeterministic. 
— Simple versus Complex [6]. 


For example, [12, 26] deals with the optimization of the life cycle model, which is 
a piecewise-linear function. Some knowledge of modeling in leadership and logistics 
is contained in [13-15]. 

As mentioned earlier, LP optimization problems are much easier to solve than 
NP problems, particularly when the dimension of the problem is large (i.e., large 
number of design variables). In addition, the more nonlinear the problem is, the 
more difficult it may be to optimize. Fortunately, there is much that we can do 
to minimize these difficulties [6]. In the further we will be focused to nonlinear 
programming in MATLAB toolbox and its applications, which will be presented in 
the next subsections. 


3 Application of Nonlinear Programming to Optimize 
Technological Process of Electrolytic Alkaline Zinc 
Plating 


In this part of chapter, we present some possibilities of usage of nonlinear optimiza- 
tion in engineering practice. The optimization problem is to find the optimal time of 
duration of zincing process to obtain the required thickness of the formed protecting 
layer. First, it was necessary to obtain an optimization model, so we have focused on 
developing the mathematical-statistical model predicting the thickness of the layer 
formed during the process of electrolytic alkaline zinc plating at a current density of 
0.5A -dm~?. We deal with the application of mathematical and statistical methods 
and Design of Experiments (DOE) in order to identify and analyse factors affecting 
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the zinc plating process. Based on the DOE methodology according to the central 
composite design, the set of experiments containing 40 runs was performed. The 
influence of seven input factors on the final thickness of formed zinc layer has been 
investigated and computational model predicting the thickness of the formed layer 
was developed. In order to save time, as the possibility of increasing the efficiency of 
the technological process, nonlinear programming was used to optimize the zincing 
process. 


3.1 Analysis of Zinc Plating Process 


Technological processes of surface treatment belong to multifactorial systems, 
present complex nonlinear processes actuating several technological, physical, chem- 
ical and material effects and their mutual interactions. That is why the analysis of 
these processes by classic methods appears to be non-efficient and many times leads 
to incorrect conclusions. At the process analysis, observation, examination, compar- 
ison and synthesis (optimization and forecasting), we find the basis in determination 
of bonds and relations between input and output parameters. It is significant to iden- 
tify if certain factors (input parameters) have an influence on observed parameter 
(response). Then it is necessary to find such levels of factors in order to reach the 
optimum (maximum, minimum) of the observed parameter [16, 17]. To solve such 
practical problems it is more suitable to use experimental and statistic approach 
than determination approach. The analysis and synthesis in conditions of incom- 
plete information is carried out at experimental and statistic approach of process 
analysis of surface adjustments. Even though the nature of examined process is not 
completely known, incomplete information for setting optimal conditions is updated 
by the experimentally obtained and known data. Wide use of various experimental 
methods is the consequence of incomplete information and continual improvement 
of old and creation of new objects, processes and procedures [18]. The process of 
electrolytic alkaline zinc plating, where at the right choice of technological factors 
it is possible to create such a protective layer of material that will have requested 
thickness and properties and it will fulfil defined criteria (e.g. resistance against 
corrosion), also belongs to multifactorial and non-linear systems. Identification and 
analysis of the factors functioning in this process and observing their influence on 
created layer became the object of our scientific research and experimental work. In 
contrast to the majority of scientific and expert works about the defined problem, 
where the selected parameter of created layer in relation to only one factor is analysed 
[19, 20], our work is different. Our research is focused on deeper and more complex 
identification of influences of several factors and their interactions affecting during 
the process of electrolytic alkaline zinc plating, which did not go without the use of 
Design of Experiments (DOE). In contrast to COST approach Design of Experiments 
(DOE) enables us to observe in given time common influence of several factors on 
the response and find optimal combination of setting the values of input process 
parameters. The change of only one selected factor in given time is considered at 
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COST approach within the frame of experimental work (COST is an acronym of 
English expression “consider one separate factor at a time’), which is inefficient 
approach, because it does not provide necessary information in order to reach real 
optimum, experimental work is overpriced. 


3.2. Experimental Part 


$355JO material was used to carry out the experiment. Within the frame of individual 
experiments above mentioned method of electrolytic alkaline zinc plating at current 
density of 0.5 A - dm~? was performed. Zinc electrolyte was used, which is character- 
istic of its high depth efficiency, low zinc concentration and high coating ability. Zinc 
is currently an available option at the protection of metals from corrosion and creating 
special properties of material surface. Zinc anodes were placed into a separate disso- 
lution bath, where it is possible to regulate the zinc content by sinking and lifting of 
anodes. Zinc is brought into the coating bath through filter by the circulation circuit. 
During alkaline coating, it was necessary to secure the components of glitter addi- 
tives in requested concentration in the electrolyte. Zinc plating of samples was based 
on DOE with selected central composite plan with 40 individual experiments. We 
were interested in the influence of 7 input factors on the resulting thickness of created 
zinc layer. The experimental work was performed in order to obtain and express the 
functional dependency ¥ = f (x1, x7, x3, X4, Xs, X6, X7), Where x;—is the amount of 
NaOH in the electrolyte, x.—is the amount of ZnO in the electrolyte, x3—the amount 
of glitter additive Pragogal Zn3401 in the electrolyte, x,—the amount of glitter addi- 
tive Pragogal Zn3402 in the electrolyte, x;—electrolyte temperature, x6—time of 
zinc plating, x7—voltage. Experimental conditions and individual levels of indi- 
vidual variables (factors) can be found in Table |. The thickness of layer coating using 
the digital thickness meter MINITEST 4000 was measured at individual samples in 


Table 1 Indication and values of technological factors 


Coded factor | Factor Unit Factor levels 
—2.21 -1 0 1 2.21 

x1 m(NaOH) [g . Ir] 19.33 80.00 | 120.00 | 180.00 | 240.67 
x2 m(ZnO) [g : i] 3.15 8.00 12.00 16.00 | 20.85 
x3 m(Zn3401) [ml : i] 2.18 4.00 5.50 7.00 8.82 
x4 m(Zn3402) [ml . ] 0.68 2.50 4.00 5.50 7.32 
x5 T [°C] -0.13 12.00 | 22.00 | 32.00 | 44.13 
X6 t [min] 1.15 6.00 10.00 14.00 18.85 
x7 U [V] 0.79 2.00 3.00 4.00 5.21 


Source Own 
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selected experimental points. Experimentally obtained data presented an input matrix 
for the further statistic processing. 

Design of Experiments, which was used to identify significant factors influencing 
the thickness of created layer, enabled us to obtain maximum amount of information 
with high statistic and numeric precision at optimal number of individual experiments 
[17]. 

Individual experiments were carried out according to the matrix of the design of 
experiment as a combination of individual levels of 7 input factors in accordance 
with Table 1, in which experimental conditions can be found. Individual experi- 
ments were carried out in random order. This randomisation is needed because of 
minimizing systematic errors or preventing subjective preferring of some of the input 
factor levels. Orthogonality of experimental plan was verified by means of the scalar 
products, i.e. all matrix columns of experimental plan must be perpendicular to each 
other and non-zero in order to avoid the wrong indication of statistic non-significance 
of regressors [21]. Well known transpose relation [21], due to which original physical 
units can be transposed to non-dimensional form, was used to norm (code) the basic 
factors. 


3.3 Results of Experiments 


Exploring analysis, screening analysis, dispersion analysis and DoE analysis were 
carried out based on statistical analysis of experimentally obtained data. By using 
software products such as Matlab, Statistica, JMP or QC—Expert we recognized 
significant factors that have influence on the final thickness of AAO layer, analysed 
their interactions and obtained the shape and coefficients of mathematical and statistic 
models that predict the thickness of created layer at changing factor levels. Data 
analysis was carried out with statistically correct approach involving the analysis of 
basic conditions and following analysis of the classic regression triplet: data, model, 
residuals. That is why it can be said there was no numeric and statistic incorrectness 
of the results when deducing and interpreting the results, which was also confirmed 
by practical experiences in the area of surface treatment. 

Basic analysis of obtained results of measuring thickness of created layer at 
individual experiments results from dispersion analysis (ANOVA), Table 2. 

It can be judged from the table of dispersion analysis that variability caused by 
random errors is markedly lower than variability of measured values explained by 


Table 2 ANOVA table for proposed prediction model 


Source DF Sum of squares Mean square F ratio Prob > F 
Model 6 207.5383 34.5897 5.9572 0.0003* 
Error 33 191.6105 5.8064 


C. Total 39 399.1488 
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Table 3 Table of error of insufficient model adjustment 


Source DF |Sumofsquares |Meansquare | F ratio Prob>F | Max RSq 
Lack of Fit | 28 171.2105 6.11466 1.4987 | 0.3486 0.9489 
Pure Error 5 20.4 4.08 

Total Error =| 33 191.6105 


the model and value of obtained significance level (Prob > F) points out adequacy of 
used model based on Fisher—Snedecor test. The further testing of used model by so 
called error test of insufficient model adjustment, where we observe residual disper- 
sion and dispersion of measured data inside the groups and test if regression model 
sufficiently describes observed dependency, can be found in Table 3. Considering 
the obtained value of significance 0.3486 by Fisher test it can be said that the model 
sufficiently describes variability of experimentally obtained data. Model significance 
is confirmed, dispersion of residual values is lower than dispersion inside individual 
groups at the selected significance level of a = 0.05. 

The following Table 4 presents assessment of model parameters with testing of 
significance of individual effects and their combination at significance level «= 
0.05 based on above mentioned conditions and their completion (Tables 2 and 3). 

It is obvious from the table of assessment of model parameters that voltage input 
and time of zinc plating have the main influence on the thickness of created zinc 
layer. Except for individual functioning factors the second power of NaOH in the 
electrolyte and amount of glitter additive Pragogal Zn3402 have also significant 
impact. The absolute term of the model, which contains all “neglected” functioning 
factors in the process of electrolytic alkaline zinc plating, has the highest importance. 
VIF (Variance Inflation Factor) or inflation factors of dispersion of regressors are 
important indicators from the point of view of statistic criterion of non-orthogonality 
[21]. It is valid that V7 F > 1 is lower or equal to 1 (predictors are not correlated, 
plan is uncorrelated and orthogonal), higher than 1 but lower than 5 (indication of 


Table 4 Table of assessment of model parameters 


Term Estimate Std Error | t Ratio | Prob>Itl | Lower 95% | Upper 95% | VIF 


Intercept | 14.10792 | 0.599205 < 0.00017 | 12.88882 15.32701 

x7 1.483657 | 0.432078 0.00167 | 0.604588 2.362725 | 0.985741 
X6 1.043233 | 0.432078 0.0215* | 0.164164 1.922301 | 0.985741 
XX] —1.06999 | 0.368621 5 0.0065* | —1.81995 | —0.32002 | 0.911005 
X1.x2 —0.78191 | 0.512519 0.1366 | —1.82464 0.260817 | 0.985741 
X5.X4 0.647948 | 0.512519 0.215 —0.39478 1.690676 | 0.985741 
X4.X4 —0.79972 | 0.368621 | —2.17 0.0373" | —1.54969 | —0.04976 | 0.911005 


(x7—voltage,x6—time of zinc plating, x; amount of NaOH, x2—amount of ZnO, x4—amount 
of glitter additive Pragogal Zn3402, x5—electrolyte temperature, intercept—absolute term of the 
model, “Factor or factor combination is significant at selected significance level of 5%). 
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medium correlation and plan non-orthogonality), higher than 5 but lower than 10 
(significant correlation and plan non-orthogonality) and finally higher than 10 (multi 
correlation of regressors and plan non-orthogonality). If V7 F > 1, assessment of 
regression coefficients is numerically correct, but their p—values, which are defined 
as diagonal elements of inversion correlation matrix (j = 1--- p), are not correct. 


VIF; = Djj = diag(D) = diag(R“') (7) 


Since results point out uncorrelation of predictors and orthogonality of experi- 
mental plan, based on Table 4 prediction equation for the thickness of created layer 
(9 = th) in coded form can be expressed as 


} = 14.10792 + 1.483657 - x7 + 1.043233 - x5 — 1.06999 - x? (8) 
—0.78191 - x1 - x2 + 0.647948 - x5 + x4 
To set up the prediction relation in the natural scale it is necessary to realize that 
within the process of analysis used factors were coded by DoE norming in a coded 
scale: 


x(i) = Saas 


jaan 0) 


xi) = 


where xqg(i) is standardized variable according to DOE methodology, x (i)—is an 
original input variable (factor), where i = 1, 2,3---n, n— is the number of input 
factors, Xinax—the maximum value of original variable x(i) [physical unit], Xnin— 
the minimum value of original variable x (7) [physical unit]. DOE standardization by 
formula (9) presents linear transformation of values of origin variable from interval 
(Xmins Xmax) to interval (—1, 1) and provides transformation of factors from original 
physical unit to dimensionless form. 

Considering transpose relation and statistic prediction Eq. (8) it is possible to 
express prediction relation describing observed dependency (effect of input vari- 
ables on the response: the resulting thickness of zinc coating) for current density of 
0.5A-dm~ as: 


th = 0.386 - m(Zn3402) — 0.035 - T + 0.671-U +.0.118- t + 0.034 m(NaOH) 
+0.112 - m(ZnO) — 8.736 - 10~> - (m(NaOH))? — 0.073 - (m(Zn3402))? 
—7.983 -10-+- m(NaOH) - m(ZnO) + 8.819-10~> - m(Zn3402) - T + 6.806 
(10) 


The prediction model will serve as the basis to optimize the process by nonlinear 
programming. 
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3.4 Optimization of Zinc Plating Process in Matlab 


The nature of optimization problems lies in determining such a combination of values 
of individual factors, at which “the best” value of optimization parameter is obtained. 
The obtained factor values are called optimal values. Optimization problems are 
highly significant at proposals of technological processes as well as projecting engi- 
neering objects, their realization and during their operation. In term of surface treat- 
ment of metals the time of the process duration is one of the most important param- 
eters that determine the efficiency of the entire process. If we manage to minimize 
the time needed to create the layer with requested thickness at setting functioning 
factors, economic profit can be maximized at securing requested quality. If the time 
of zinc plating (xs—time of zinc plating) is expressed from Eq. (10), an objective 
function for optimization is obtained. 

The basic task of non-linear optimization is to find the minimum of the problem 
defined as 


c(x) < 0 
ceq(x) =0 
min f(x) = A-x<b (11) 
: Aeq :X = Deg 
Ib<x <ub 


where X, Deq, Ib (lower boundary) and ub (upper boundary) are vectors, A and Aeg 
represent matrices of constant coefficients, c(x) and ceq(x) are vector functions 
and f(x) is ascalar function, f(x), c(x) and ceq(x) are nonlinear functions. Condi- 
tional inequalities are obtained at defining of fringe conditions of the process of 
electrolytic alkaline zinc plating considering data presented in Tables | and 4 and 
nature of the process 


80 < m(NaOH) < 130 (12) 
7.5 < m(ZnO) < 18 (13) 
2 < m(Zn3042) < 6.5 (14) 
WT = 28 (15) 
2<U<5 (16) 


To solve optimization problem (11) nonlinear programming in Matlab was used. 
In consideration of requested thickness of the layer of 12[j1m] as the most often 
requested thickness respecting fringe conditions (12)—(16), optimal time of the value 
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Fig. 1 Graph of current points of optimization of zinc plating time for layer thickness of 12[j1m] 


11.305[min] is obtained at m(NaOH) = 112.585[g-17'], m(ZnO) = 18[g-1-'], 
m(PragogalZn3402) = 3.392[ml -1~'], T = 12[°C] a U = S[V]. The graphical 
representation of optimization output is presented in Figs. 1, 2, 3, 4 and 5. 
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Fig.2 Entire function evaluations of optimization of zinc plating time for layer thickness of 12 [1m] 
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Fig. 3. Actual function value of optimization of zinc plating time for layer thickness of 12 [1m] 
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Fig. 4 Maximum value of violation of function of optimization of zinc plating time for layer 
thickness of 12[,1m] 
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Fig. 5 Step size of optimization of zinc plating time for layer thickness of 12[.m] 


4 Application of Nonlinear Programming to Anodic 
Oxidation Process 


In this part of chapter, we present applications of nonlinear programming to opti- 
mization of anodizing process. It is often the case that an industrial process is not 
operated at optimal conditions. Usually a process is affected by many factors which 
often interact and thereby influence the process in a non-additive manner. So, in tech- 
nical practice, it is very important to understand how factors and responses relate 
to each other, and to reveal which factors are influential for which responses [16- 
18]. If some processes are very complex and do not exist suitable description or 
mathematical-physical-chemical models of them, it is necessary to recognize and 
identify the relationships between considered variables only experimentally. 

Such complex, multifactorial and non-linear systems often include technolog- 
ical processes of surface treatment. The most effective of them is anodic aluminium 
oxidation (AAO) process, because with the correct setting of technological factors 
it is possible to create a protective layer with the desired thickness and properties 
(hardness, corrosion resistance, etc.). The application of mathematical and statis- 
tical methods helped us to identify and analyse factors acting during the process of 
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aluminium anodic oxidation at defined current density of | Adm~?. The influence of 
input chemical and physical factors and their interactions on the resulting layer thick- 
ness were investigated. Based on the DOE methodology the set of 26 experimental 
test runs was carried out, data were analysed and prediction model was developed, 
which is presented in our paper [23]. 


4.1 Experimental—Anodic Oxidation of Specimens 


The alloy EN AW-1050A H24 with dimensions 100 x 70 x 0.5 [mm] was used for 
the specimens. Anodic aluminium oxidation in sulphuric acid (GS method) at direct 
current and at constant current density J, = [A : dm~”] was used for individual 
test runs of experiment. The anodizing of aluminium samples was performed based 
on the DOE methodology corresponding to the central composite design with 26 
test runs. The Hull cell, in rectangular trapezium shape, was used for individual test 
runs. Whereas one of the co-authors has been active in the field of surface treatment 
for many years, there was no problem with actual implementation of the experiment. 
Moreover, the practical experience was very valuable when interpreting research 
results. 

We investigated the influence of input chemical factors x;, x2 (x;—amount of 
sulphuric acid H2SO, in electrolyte, x2—concentration of aluminium cations Al 
in electrolyte) and physical factors x3, x4 (x3—the electrolyte temperature T , x4— 
anodizing time t) on the resulting thickness ) of aluminium anodic oxide layer 
(AAO) formed at constant current density J4 = 1[A . dm~’], i.e. function depen- 
dence ¥ = f (x1, x2, x3, x4). The thickness of the resulting AAO layer was measured 
in the defined experimental points by usage of digital thickness meter MINITEST 
4000. Experimental points were indicated at the intersections of horizontal and 
vertical lines, distance between individual lines and from the bottom edge of the 
sample was 5 [mm]. The layer thickness was measured five times at each experi- 
mental point. In order to eliminate systematic measurement error the measurement 
on etalon was performed. The arithmetic average of five measurements, calculated 
by microprocessor of thickness meter, was taken as an individual measurement. 

In order to identify significant factors affecting the thickness of the formed AAO 
layer and analyse the relationships between them, the DOE methodology was used. 
Taking into account the expected non-linear dependencies, the central composite 
design was chosen from a relatively large amount of design types. Individual test 
runs were performed on the basis of the design matrix of the experiment created as 
a combination of individual levels of four investigated factors according to Table 
5, where experimental conditions are presented. By means of DOE, test runs were 
performed in random order to eliminate systematic error and to avoid subjective 
preference of any factor level. Use scalar products the orthogonality of experiment 
design was verified, i.e. all columns of design matrix must be perpendicular to each 
other while nonzero. Due to the orthogonality of the experimental design, we can 
avoid improper indication of statistical insignificance of factors effects [22]. The 
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Table 5 Experimental conditions 
EXPERIMENT GS 
Method of anodic oxidation GS 
Design of experiment Central composite design 
Number of factors 4 Test runs 26 
Control factors 
Coded scale Natural | Physical unit | Factor levels 
scale 2 -] 0 1 2 
x] m [g-1] 25.00 | 100.00 175.00 | 250.00 | 325.00 
(H2SO04) 
xX) m (Al) [g 1-4] 1.32 3.97 6.62 9.26 | 11.91 
x3 T [°C] 2.00 12.00 22.00 | 32.00 | 42.00 
X4 t [min] 10.00 25.00 42.50 | 60.00 | 80.00 
Constant factors 
Material anode/cathode AW—1050A H24 Voltage 12V 


orthogonal array design is well suited for estimating main effects; orthogonal arrays 
are highly desirable in the design of experiment because they possess many important 
statistical properties, such as the statistical independence of estimated factor effects. 

DoE standardization (coding) of input factors levels by formula (9) represents 
linear transformation of values of origin variable from interval (Xmin,Xmax) tO 
interval (—1, 1) and provides transformation of factors from original physical unit 
to dimensionless form. 


4.2 Results of Experiments 


Based on the statistical analysis of experimentally obtained data (exploratory data 
analysis EDA, screening analysis, analysis of variance ANOVA, DOE analysis) 
we have indicated significant factors affecting the final AAO layer thickness. We 
used software products such as MATLAB, Statistica, JMP, QC-Expert [11, 18, 24, 
25], we analysed how factors interact, and obtained computational statistical models 
predicting the thickness of the resulting layer at varied levels of factors. Data anal- 
ysis was performed with a statistically correct approach including analysis of the 
basic assumptions and subsequent analysis of the classical regression triplet: data, 
model, residues. Therefore it can be stated, that results conclusion and interpreta- 
tion were performed without numerical and statistical incorrectness, what was also 
confirmed by practical experience in the subject matter field of surface treatment. It is 
well known that statistical techniques are useless unless combined with appropriate 
subject matter knowledge and experience [16, 17]. 
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Table 6 The estimation of model parameters for J4 = 1[A : dm~?] 


Term Estimate | Std Error | t Ratio | Prob >Itl Lower 95% | Upper 95% | VIF 
Intercept 12.193 0.430 28.35 <0.001* 11.290 13.097 

x] 0.385 | 0.569 0.68 0.5069 —0.810 1.581 3 

x4 4.047 | 0.569 TAL <0.0001* 2.852 5.243 3 
x3 1.817 | 0.329 5.53 <0.0001* 1.127 2.507 1 
X3*Xx3 —1.972 |0.317 —6.23 | <0.0001* | —2.637 —1.307 1 
xx xy 1.163 | 0.232 2 <0.0001* 0.674 1.651 3 
x1*x1*x4q4 | —2.089 | 0.697 —3 0.0077* | —3.553 —0.625 3 
x1*xq*x3 | —1.203 | 0.402 —2.99 0.0079* | —2.048 —0.357 1 


The screening analysis showed that at the chosen significance level of x= 0.05, 
the amount of aluminium in electrolyte had no significant effect on the layer thick- 
ness created at current density of J4 = [A . dm~’]. Therefore, the prediction model 
with DOE standardized level values of remaining three factors x;, x3, x4 and their 
interactions. Regard to practical experience we know, that factor x7 causes the dete- 
rioration in the quality of the resulting layer. The analysis in terms of the estimated 
effects (Table 6) indicates that the amount of sulphuric acid as a representative of 
chemical factors and both physical factors, electrolyte temperature and anodizing 
time, have influence on the AAO layer thickness. Also interactions of these factors 
and intercept term, which include all neglected influence of process, have been indi- 
cated as significant. If we do not consider intercept, the amount of sulphuric acid 
has the most significant effect on final thickness; it is indicated by positive sign of 
estimation of an individual t-test. However, it was necessary to examine the influence 
of significant factors with the interaction analysis approach, because we are able to 
uncover new information: increasing the total amount of acid in interaction with the 
anodizing time operates in the opposite direction, thus to reducing the layer thickness 
of its dissolution. 

Based on estimation of the model parameters (Table 6) it is possible to write the 
statistical prediction model for DOE standardized data in the form: 


Vath? = 12.19 + 4.05 + x4 + 82+ x9 — 1.97 + x$+ (17) 
+x7(1.16 ~x, — 2.09 - x4) — 1.20- x2 - x1 - x3 
To optimize the anodic oxidation process, two prediction computational models 
were chosen, namely for deposition at current density of J, = 1[A : dm~*| and 
current density of J, = 2[A . dm~*]. This decision was made according to practice. 
The most commonly defined parameter in the design documentation is the thickness 
of the layer formed during the anodic oxidation process. Based on practical experi- 
ence, it can be stated that up to 95% of customers indicate the thickness of the formed 
layer as the basic requirement for performing of anodizing process. 
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For a predetermined layer thickness, the deposition time x4 (t— in natural scale) 
was chosen as the optimization parameter. Depending on the economic efficiency 
of the anodic aluminium surface treatment process, the decisive indicator is the 
deposition time. This is the longest compared to the duration of other technological 
operations (degreasing, pickling, clarifying, sealing) and it defines the cycle of the 
production line. The basic objective of the optimization is therefore to minimize 
the deposition time while maintaining the required layer thickness (thp— the layer 
thickness prescribed by technical documentation). 


t(X1, X2,X3)h=thp > min (18) 


where x;—amount of sulphuric acid m(H2S0O4) in electrolyte in Ig . 1), Xx2— 
concentration of aluminium cations m(AI) in electrolyte in [g “1 med x3—the elec- 
trolyte temperature T in [°C], f-time deposition in[min], th—-the thickness of the 
formed layer in[j1m], thp—the prescribed thickness of the formed layer in[j1m]. 

In order to simplify computations, at the beginning we will use the mathematical- 
statistical model (8) for the current density of J4 = 1[A : dm~?] in coded scale. 

If we denote rh the left side of the dependence (8), we can express anodizing time 
X4: 


th= 1.16x} _ 2.09x4x? — 1.2x2%x1x3 — 1.97x3 + 1.82x3 + 4.05x4 + 12.19 
(19) 


(4.05 — 2.09x?)xq = —1.16x} + 1.2x1x3%4 + 1.97x?2 — 1.82x3 — 12.19 + th 
(20) 


_ 1.2x1x3x4 — Dea? + 1.97x3 — 1.82x3 — 12.19+ th 
= 4.05 — 2.09x? 


(21) 


X4 


According to DOE standardization of individual factors (9) and natural scale 
(Table 3), it is possible to convert developed statistical model (17) into the form 
of natural scale and so to create the technological model of chemical and physical 
factors effect on the AAO layer thickness. Transformation equations for individual 
factors are. 


m (H2SOx4) — 175 


eee 22 

am 150 2) 

_ m(Al) — 6.62 os 
at oC 
T—22 

ee (24) 


20 
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142.5 
ae ae (25) 


To make computation simpler, let us denote: a—the amount of sulphuric acid 
m(H2SOa,) in electrolyte, b—concentration of aluminium m(Al) in electrolyte and 
c—the electrolyte temperature T . Considering Eq. (21) and transformation Eqs. (22)— 


(25), the objective function for optimization of anodizing time can be express in 
natural scale as: 


— 143.9334+10.179b+0.172c? 0.72a+6.316-107*a?=1.209-107°a* 4. 


oo —9.289- 107 a2+3.251-1077a-+1.205 (26) 
4. 2 7-1060-0.4626b—1,750-10“ea—5.816-10" bat 2.64410 *ab+35-thp +42.5 
—9.289- 107° a24+3.251-1077-a+1.205 ? 
It is clear, that: 
—9.289 - 107°a? + 3.251 - 10-7 -a + 1.205 £0 (27) 


By solving quadratic Eq. (17) we obtain roots aj = —33.807 and az = 383.807. 

Because of the amount of acid in the electrolyte cannot be a negative, only az can 
be considered as a solution. 

Sometimes, in the applications, the set of input parameters (factors) is bounded, 
i.e., the input parameters have values within the allowed space of input parame- 
ters (search space X). Then the optimization task is to choose n control variables 
X1,X2,-+++ ,X, from the search space in a such way, that we can reach the minimum 
(maximum) of the objective function, so optimization problem has the form (2): 


minimize fo (x1, x2,.... Xn) 


subject to fi(x) < bj,i =1,---,m 


Here the vector x = Com X2,+°:, Xi) is the optimization variable of the problem, 
the function fo : R’ — Ris the objective function, the functions f, : R” — R, are the 
(inequality) constraint functions fori = 1,--- , m, and the constants b;,, by, +--+ , Dy 
are the limits, or bounds, for the constraints. A vector x* is called optimal, or a 
solution of the problem. Generally optimization problem has been written also in the 
form (5) or (6). 

To perform optimization of anodizing process, we have used the software 
MATLAB R2016a, Optimization Toolbox. The optimization task was to find the 
minimum of the problem defined as: 
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c(x) <0 
ceq(x) =0 
min f(x) = A-x<b (28) 
: Aeq :X = Deg 
Ib<x <ub 


where xX, Deg, Ib and ub are vectors, A and Agg are matrices, c(x) and ceq(x) are 
vector functions and f(x) is a scalar function. Functions f(x), c(x) and ceg(x) are 
nonlinear functions [6]. Inequality constraints were obtained at defining of fringe 
conditions of the process of anodic oxidation considering data presented in Table 5 
and in Table 6. It was also important to take into account ranges of technological 
conditions used in practice and the nature of the process, so we can define the next 
constraints: 


165 < m(H2SO4) < 225 (29) 
1.5 < m(Al) <9 (30) 
12<T <24 (31) 


These constraints make the optimization solution feasible, i.e. not unacceptable. 
They are called behavioural constraints. The vector x represents input, that we can 
change to improve the system behaviour. In the Table 7, there are shown outputs of 
optimization for individual prescribed thickness of the layer formed during anodizing 
process at current density of J4 = 1[A . dm~]. 

According to the objective function (18), resp. in the form (26), and constraints 
that are linear, for the optimization problem (28) it can be written: 


Table 7 The results of the optimization process of anodizing at J4 = 1 [A . dm~?] 


Required layer thickness m(H2SOa)[g . I] m(Al) [g . ] T[°C] toptim [min] 
[wm] 

10 165.806 9 24 48.0820 

12 167.309 9 24 49.8132 
14 168.435 9 24 51.5436 

15 168.894 9 24 52.4086 
20 170.515 9 24 56.7323 

25 171.479 9 24 61.0548 

30 172.109 9 24 65.3768 
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-100 
1 00 
0 —-10 
A= 32 
01 0 (32) 
00-1 
00 1 
b = [—165; 225; —1.5; 9; —12; 24]. (33) 


In the Table 7 are presented the resulting values of the objective function, i.e., the 
optimal deposition time for each desired thickness of the AAO layer formed at the 
current density of J4 = 1[A . dm~?]. There are also shown conditions of anodizing 
process. 

Based on the results presented in the Table 7 it can be stated, that the resulting depo- 
sition time is influenced mostly by amount of sulphuric acid in the electrolyte. The 
optimal value (minimum) of the time deposition f,,; can be achieved by increasing 
the amount of sulphuric acid m(H2 S04) in electrolyte while keeping other variables 
at constant values, i.e. m(Al) = Ig . | and T = 24(°C]. 

The graphical representation of optimization outputs is given in Figs. 6 and 7. 


5 Conclusion 


Regarding to the fact that in many published work and studies about the subject 
matter field of surface treatment processes (zinc plating, anodizing) only the COST 
approach was used (COST is the acronym for consider one separate factor at a 
time), it is desirable to analyse this technological processes in their complexity. It is 
obvious, that the implementation of COST approach is not suitable for experimental 
work to analyse technological processes of surface treatment, when it is important 
to observe mutual influence of several functioning factors at the same time [16, 17]. 
The application of DOE approach is more suitable, which one is presented in this 
chapter that solves a particular problem from practice, where prediction equation set 
on the basis of statistical analysis of experimentally obtained data in the process of 
electrolytic alkaline zinc plating and anodic oxidation was optimized by application 
of non-linear programming. 

The results of experimental research carried out by the DOE methodology point 
out that the thickness of eliminated zinc layer depends mainly on the combination of 
chemical and physical factors at electrolytic zinc coating in slightly acid electrolytes. 
These dependencies have to be understood as non-linear relations. The crucial role 
is played by their mutual interactions. It is obvious from results that the thickness of 
eliminated zinc layer rises with increasing volume of chloride zinc as a basic source 
of zinc in electrolyte. This dependence is non-linear. It is possible to define certain 
critical volume of chloride zinc in electrolyte, at which layer thickness reaches its 
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Fig. 6 Graphical representation of optimization outputs for the required thickness of th = 10 [wm] 


maximum thickness at present observation of electrolyte temperature, deposition 
time and voltage. 

This chapter clarifies the usefulness of application of mathematical and statis- 
tical methods, DOE methodology and optimization when evaluating experimentally 
obtained data in the engineering practice in order to improve “system behaviour”. 
We were able not only identify significant factors and their interactions affecting the 
response (the thickness of the formed layer, i.e. zinc coating thickness and AAO layer 
thickness), but also to create soft mathematical computational models. Based on these 
models it is possible to predict the final thickness of the formed layer for consid- 
ered factors and their interactions. The processes of surface treatment (aluminium 
anodic oxidation and zinc plating process) belong to the multifactorial and nonlinear 
systems, so application of nonlinear programming in order to save time, as the possi- 
bility of increasing the efficiency of the technological process, help us to optimize 
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Fig. 7 Graphical representation of optimization outputs for the required thickness of th = 30 [wm] 


the zincing process and the process of anodic oxidation. In this chapter, we discuss 
selected optimization methods and mathematical models. 

The main contribution of our research work is the fact that optimal process condi- 
tions of surface treatment processes were found so that fulfilment of demands of final 
customers was secured. The results obtained by our research work have important 
benefits for technical practice, because they were practically verified under conditions 
of real production. 
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Applicability of Selected Mathematical ®) 
Statistics Methods During Decision siete 
Activities Within Joint CBRN Defence 

COE 


Bietislav Stépanek, Pavel Otiisal, and Sarka HoSkova-Mayerova 


Abstract Mathematical-statistic methods provide valid tools which are eligible for 
analysis of problems with the help of collected relevant information and settlement 
of possible consequences of various decisions. An application of these methods in a 
command and control process enables to find possible solutions of the problems we 
can meet in the area of NATO with. These methods provide significant information, 
however, we have to rightly interpret them, judge their relevance and to determinate 
a possibility of their maximum usage. This gained information introduces only a part 
of a decision process within the area of Joint CBRN Defence COE. 


Keywords Cost analyze benefits - Decision making - Command and control 
process + Joint CBRN Defence COE 


1 Introduction 


In connection with restructuring the Command Structures of NATO in years 2003- 
2004, there has been decided to create so-called ,,Centres of Excellence’, whose 
task is to improve the quality of education and training, increasing levels of inter- 
operability and abilities, doctrinedevelopment and experimental consideration and 
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evaluation of new conceptions, always in given specific area according to delegated 
military-political ordering [1-3]. 

Above mentioned Centres are national or multinational sponsored facilities, which 
provide special abilities and experiences in benefit of whole coalition, above all for 
supporting its transformation. They make possible to improve education and training 
on qualitative higher level, to improve interoperability and abilities in given area and 
to participate in the development of doctrines (above all STANAGs, service instruc- 
tions and military publications) and conceptions of NATO through their certifying 
and consequent cooperation [4]. 

At present member states propose to the NATO 25 specialized Centres. Each 
of these Centres is focused on other specialized area. The CR has after previous 
appreciation of all possibilities decided to commit for specializing the Czech Armed 
Forces in Protection against weapons of mass destruction and has accepted the role 
of host country in building of an International Military Organisation in form of 
COE within its territory [5, 6]. This element is in the structures of NATO led as a 
Joint Chemical, Biological, Radiological and Nuclear Centre of Excellence (Joint 
CBRN Defence COE). This international organization cooperates with national both 
university places and other institutions during fulfilment of its strategical mission [7]. 


2 Chosen Methods in Support of Decision Activities Within 
Joint CBRN Defence COE 


Mathematical statistic methods provide proved instruments, suitable for analyzing 
problems by means of relevant information acquisition to determinate possible conse- 
quences of variant decisions. However, it is impossible to affirm that application of 
these methods in decision activities makes possible to find always solving of all prob- 
lems with them we meet in environment of NATO. But these methods can provide 
some important information. Gained information have to be correctly interpreted, 
appreciated their reliability and determined possibility of their maximal utilization. 
However, this way gained information represents only one part of decision activities 
[8, 9]. 

Every manager problem is to be examined in terms of quantity and quality. Quan- 
titative criteria use direct appreciation and comparison of variants on the basis of 
predefined values. We use them at the time, if we can required measured quantities 
determine reliably. If some qualitative criteria should be used, these are to be trans- 
formed on measurable values. For example a qualitative feature ,,reliability” can be 
expressed in intervallic scale. Next step is sorting and variant selection, where we 
use various methods and techniques. Needed information must be acquired at the 
same time and considered in light of certain problem. The aim is on the basis of these 
two informative sources to accept optimal solving variant [10, 11]. 
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2.1 Statistical Conception and Risk Processing 


At investment decision-making we calculate an expected return of investment (ov) 
as a weighted probability average of potential profit abilities (pv) and a riskiness 
(risk rate) as a weighted probability average of deviations pv from ov. Under stan- 
dard deviation of profitability we understand a spread o (dispersion) of standard pv 
distribution and under a “‘sliminess of Gausian curve” variation coefficient [12, 13]. 

On the basis of above mentioned we can state, that the riskiness means a signifi- 
cance pv clustering round ov. This implies that, capital projects with bigger density 
of clustering are less risk, because they have smaller spread and slimmer Gaussian 
curve [14, 15]. 

For calculations we determine, standard deviation o equals percentage height of 
absolute risk. So we can write down the relation: o = risk charge of investment = r. 

Mean value of expected cash flows: 


5 


T= PT; - p; (1) 
j=l 


where: 


— PT; are individual cash flows of different variants; 
— pj is a probability, that individual cash flow occurs; 
— nis number of expected cash flows variants. 


There is accepted: )> pj = 1 and0 < p; < 1 

For full degree of risk expression in investment projects it is necessary to compare 
deviations of individual cash flows from average expected value, when higher 
deviations will signalize a riskier project [16-18]. 

Spread of expected cash flows from investment variants: 


o? =) (PT;-PT)’ - p, (2) 
j=l 
Standard deviation: 


og =Vo? (3) 


Bigger standard deviation of cash flows an appropriate project embodies, the 
bigger is its risk. For comparison of project riskiness with essentially different 
expected average values of cash flows we make use of variation coefficient V. 


Vea (4) 
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The higher a variation coefficient is, the higher is also an investment risk. 


2.2 Numerically Expressed Risk 


Risk can be expressed in numbers, so in quantitative way and in term of financial 
decision-making further numerically processed by calculus of probability. Proba- 
bility, that ex-pressed cash flow occurs in future, can be in percentage quantified as 
a possibility of its realization. 


Probability of financial flow coming on can be found out: 


objectively, it means by statistical analysis of historical data (condition is to have 
accessible statistically significant data); 
subjectively, it means by estimation on a basis of risk factors effect consideration 
and with expert experience application. 


3 Methods of Useful Cost Analyze 


For international environment of NATO, there are suitable support methods—useful 
cost analysis (cost-output methods), namely from following reasons: 


they make possible to measure inputs (costs on the actions under consideration 
of public policy) in relation to expected outputs (results of public policy); 
application of these methods is relatively easy and provides at the same time 
relatively enough needful information for final decision; 

mentioned methods are based on criteria application of economy, efficiency and 
usefulness, so on criteria, that every manager of public sector is obliged to trace, 
who uses public funds. 


To the cost-output methods belong these four basic cost-output methods: 


cost minimization analyze (CMA): for appreciation these intends and projects, 
when variants under consideration have qualitatively and quantitatively analog- 
ical and comparable outputs. Frequent using at public tender, where is only one 
appraising criterion—the price. Criterion of selection is the lowest price; 

costs and benefits analyze (CBA): for appreciation investment actions, when is 
possible and suitable to express results of public policy in the money. Criterion 
of selection is the best ratio of benefits and costs; 

cost efficiency analyze (CEA): for results appreciation of public funds, there are 
in units in kind form. Criterion of selection is the lowest cost on unit in kind of 
output; 

cost utility analyze (CUA): for results appreciation of public funds, when we 
measure a fulfilment grade of aim and a satisfaction grade with regard to spent 
costs. 
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Methods of cost utility analyze are suitable instruments for the best variant selec- 
tion. Each of mentioned methods highlights a specific criterion. We should consider 
the choice of appropriate method with regard to appraising criterion. 

For cost output methods is characteristic that all mentioned methods measure 
inputs (costs) in monetary units. Than it means, if we want to use some of mentioned 
method for variant appreciation, we have to know cost quantification for the actions 
given by policy. It’s a real prerequisite, because we can find needed data in costing 
to given action, in budget, or in accounting. 


3.1 Cost Minimization Analyze 


To relatively simplest methods ranks (cost minimization analyze) CMA. Its principle 
is to find a variant with the lowest costs. Therefore, criterion of selection is the 
lowest price. One condition at the selection of variant must be fulfilled, that each of 
appraised variant has to comply with supposed standard, or to come up to certain 
predefined (expected) use properties. If that isn’t so, then those method using can 
lead in inaccurate conclusions. Then it is more suitable to use another cost utility 
analyze. If outputs can be expressed in monetary units too, than is suit-able using of 
CBA method (costs and benefits analyze). 


3.2 Costs and Benefits Analyze 


At this method we search answer to question, what monetary effects brings appraised 
variant. CBA is the only cost output method, which measures so inputs as outputs in 
monetary units. At the beginning of costing we have to decide, if we calculate cost 
and benefits flows in nominal form (then inclusive of inter-annual price fluctuations) 
or in constant prices. We distinguish two standard types of CBA and in terms of so- 
called social cost and social benefit analyze we differentiate two basic types (forms) 
of CBA. 

First type of CBA is so-called shorter costs and benefits analyze, which is usually 
denominated as cost and benefit analyze. By this analyze we quantify so-called direct 
costs having immediate relation to given investment and so-called direct benefits 
resulting to the target group. In this case costs output as injury, what relates directly 
to given to investment. Than benefits are these, in monetary units expressed utilities, 
which have positive impact on target group under consideration. These costs and 
benefits is relatively easy to quantify, because in light of costs directly relate to given 
to investment, and benefits concern immediate the project objective and his target 
group. 

Second type of CBA is so-called wider CBA. Its principle is costing of social 
costs and social benefits, when in addition to direct costs and to direct benefits, there 
are calculated also so-called indirect costs and indirect benefits, related with negative 
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and positive externals. Social costs and social benefits are related to the society as a 
whole. 

Within the framework social costs and benefits analyze we can distinguish so- 
called reduced and non reduced social costs and benefits analyze. For so-called 
wider cost a benefit analyze we use denomination: “Social costs and social bene- 
fits analyze’. In this case we calculate also sacrificed opportunity costs and on the 
side of contributions also these impacts, which given action potentially causes to 
all members of society regardless of target group. In this case we consider also 
un-material injuries, like in case of motorway building noise and with it linked detri- 
ment on health to inhabitants. There are “indirect” (induced) costs, which potentially 
consider everyone’s member of society. Mentioned example of motorway building 
brings also positive effects, such as a spared time from lowered passage time. It all is 
non-material effect having indirect character. Because potential recipient of utilities 
and also of negative impacts can be community as a whole, we talk about social costs 
and benefits analyze. Under benefit we understand each utility increasing, under costs 
then of then its decrease. In terms of sorts (eventually forms) of social costs and social 
benefits analyze we can distinguish so-called non-reduced form of social costs and 
social benefits analyze. During non-reduced form of social costs and social benefits 
analyze we value all social benefits and all social injuries in monetary units, like are 
e.g. detriment from noise caused to inhabitants, who live close to the motorway in 
building, or detriment from deformed landscape in consequence of nuclear power 
station building, and all (also non-material) contributions, such as e.g. contribution 
of drivers from new motorway in consequence of spared time. 

It is often practically impossible to quantify exactly all social benefits and all 
social detriments in monetary units. In this case we can use so-called reduced form 
of social costs and social benefits analyze. We quantify there these items, which 
can be relatively exactly specified. Effects on the side of social benefit and social 
costs, that are difficult to be quantified, we express verbally and complete with a 
commentary. 

For analyze of decision consequences we can within the frame of CBA use so- 
called incremental method of costs and benefits calculation. That is based on conse- 
quences comparison, incurred on the side of benefits and detriment (of costs), after 
investment variant implementation, with starting position, before investment. For 
position before investment implementation we use term zero variant. It’s a position, 
which does not suppose any change. Zero variant is the base for costs and benefits 
determination. Variant, which supposes a change, is variant non-zero. It’s a proposal 
for change of position by form of investment project. At costs and benefits calculation 
we proceed this way, that into investment variant we incorporate only these costs and 
benefits that are a real increase of costs and benefits related directly to investment 
variant. Other way speaking, by formation of costing matrix we ask: which costs and 
benefits would not come on in case of the zero variant, eventually would not imple- 
ment? The answer form defined costing entries on the side of costs and benefits. 
Principle of incremental method is based on searching net benefit of project, then 
the difference among expected benefits and costs on investment. 
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CBA is a method, when we calculate so-called social costs and so-called social 
benefits. For a benefit we take every utility increasing, by a cost then its decrease. 
Utility decrease is measured by occasion costs of appraising project by them we 
understand a value of sacrificed alternative action. Method CBA is for example 
suitable for support of personal decision-making [19]. In personal practice is very 
useful using this method e.g. at the selection of educational [20] and retraining 
programmes, when we can quantify so inputs as outputs in value units (so in CZK). 


3.3 Cost Efficiency Analyze 


At method of CEA we calculate inputs (costs) in monetary units and we measures 
out-puts in units in kind. We observe what unit of costs are. Best is such variant, 
which has lowest output unit costs. CEA has an advantage, that we do not value 
immaterial entries. Thereby does not take place often complicated calculation of 
immaterial entries, like at the method of CBA. On the other side it is necessary to 
make a remark that the method of CEA has its limitation therein, that we appreciate 
the same kind of output. The choice of units in kind (as a prerequisite to output 
measurement) can be problematic in cases, when flow of utilities is heterogeneous. 

Method CEA proposes a wide application at the choice of personnel programmes 
and activities (or costs efficiency analyze). In practice we can face a problem at 
decision-making, when we are able to quantify costs in monetary units, but outputs 
is suitable to quantify in units in kind e.g. in number of served persons (see e.g. 
educational institution, health services), in number of ensuring regiments (logistic 
bases, military financial orders). 


3.4 Cost Utility Analyze 


Method of CUA goes out from utility theory (utilities). Under utility we can under- 
stand in this respect as a subjectively felt satisfaction from proposed project. Such, 
who appreciates given variants, can on appraising scale express feeling of his satis- 
faction. To correct formulation of utility is needed to remark, that appraising scale 
would always be accompanied by a verbal designation. Given formulation of utility 
is then monitored at individual variants in connection with costs on appraised variant. 
We ask what utility increasing brings additional cost unit in monitored variants. This 
change is then evaluated and to the implementation is recommended the best variant. 
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4 Application of CBA During Decision Activities Within 
Joint CBRN Defence COE 


In practice of International military organization Joint CBRN Defence COE have 
occurred many different situations, when one or whole group of projects with 
different features and under different presumptions had to be considered. During these 
decision activities have been used indicators NPV, and NPV/I, which we determine 
subsequently. 

NPV represents the relation: 


n 


CF, 
NPV= 2 Gan (5) 


where: 


CF, is nominal cash flow; 

t takes the values from 0 to n; 
n isa durability life of project; 
r is a discount rate 


Calculation of profitability index NPV/I: 


Cho +) Oh 
(PV+CR) _ | oF 2 | 
(CF) (CF) 


NPV/I= (6) 


or 


(7) 


where: 


CF, is cash flow in year 0, it means expenses for investment; 
r is discount rate; 

n is durability life; 

PV is present value. 
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4.1 Utilization of Indicators NPV and NPV/I on Instances 
from Budgetary Restrictive Practice 


Example of dependent project “Modelling and simulation centre construction” 


We give a name dependent to a project, if acceptance of a project supports and 
frequently qualifies realization of the second project. Example of dependent project 
is a construction of “Modelling and simulation centre” (MSC), which allows at the 
same time enlargement of Joint CBRN Defence COE activity in the area of NATO 
CBRN problems. 

There were three variants: A, B and C. Every variant of MSC construction was 
for resources equally demanding, but in consideration to its sphere of action in other 
countries of NATO provides for realization of different contributions. If the criterion 
of choice had been utility maximization, it was right to use CBA analysis. 

From the Table | is visible, that providing only one of variants realization, the 
highest utility according to indicator NPV would bring variant B. NPV of project B 
amounted to the value of 150 mil. It was further possible to take into consideration 
some project combination, if their common benefit would reply for double capital 
expenses. 

In view of the fact that the budget, which could be released, had been in an amount 
of 150 mil., nothing else remained, until to start for variant B. 

Fact, that NPV at realization more projects is lower than sum of NPV from single 
projects, is caused with it that costs on MSC construction stay the same, but benefit 
for individual member countries of NATO is falling. The law of decreasing boundary 
utility begins to operate, which can be found out by indicator NPV. 


Example of an independent project “Cooperation with other COE’s” 


Projects are independent in the case, that they are neither compatible nor dependent. 
One’s choice isn’t reason for it, not to realize other variants, or this variant was 
influenced by the project (Table 2). 


eee Pai ipea ee MSC construction PV benefits PV costs NPV 

dependent projects (in mil. A 200 100 100 

CZK) B 250 100 150 
Cc 210 100 110 
A,B 360 200 160 
A,C 350 200 150 
B,C 300 200 100 
A,B,C 380 300 80 


Source FUGUITT, D., WILCOX, S. J. Cost-Benefit Analysis for 
Public Sector Decision Makers 
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Table 2 Example: Role of NPV indicator at choice one group of independent projects (in mil. 
CZK) 


Cooperation with other COE’s 

EOD MILMED CIMIC ARC CCD 

COE COE COE COE COE 
PV benefits 17 67 45 100 1 
PV operational expenses 2 7 5 10 0 
PV capital expenditure 5 25 20 50 2 
NPV 10 35 20 40 -1 
NPV/I 2 1.4 1 0.8 —0.5 
B/C 4.42 2.09 1.80 1.67 0.50 
Sequence according to NPV/I 1 2 3 4 5 
Sequence according to B/C 1 2 3 4 5 


In this case it was necessary to choose such project combinations, which bring 
the highest efficiency in form—maximal net utility on investment unit. Also in this 
case it was possible to use NPV indicator in form of profitability index—NPV/I. 

NPV indicator was not in this case ideal, because it preferred the alternative 
with highest net utility regardless of initial expenses. In our case was given a budget 
restriction in amount of 15 mil., therefore was according to NPV chosen “EOD COE” 
project, because the NPV value was 10 mil. and the NPV/I value was the highest. 

Profitability index is in contrast to NPV indicator relative, it has ability to order 
project variants in accordance with ratio NPV to costs of investments. Thanks this 
characteristic it is possible to determine more easily, which projects to prefer on 
condition of budget limitation. 

The case of independent projects with budget limitation is considerably specific 
and there it is impossible to act according to indicator given in advance. Nevertheless 
it is possible to find a clear criterion for project choice it means to identify all with 
budget financed project combinations and to choose that, which will reach maximal 
height of NPV. 


5 Conclusion 


Possibilities of method CUA application in international environment within Joint 
CBRN defence COE are for example during evaluation and selection of “complex 
system projects of defence’, i.e. projects, when we consider technical, personnel, 
military-special, economic, political and other targets. To utility determination is 
then important to consider priorities of individual targets, or importance of individual 
utility properties. 

Above all can be recommend using of CBA method, namely for arguing assistance 
during political discussions about contribution of extensive military projects in NATO 
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(e.g. purchase of new armoured vehicles, airplanes etc.), because this method gives 
possibility to regard on query in light of general structure of military costs and social 
benefits. 

In conditions of Armed Services, personnel with decision-making authority deals 
with rare sources, that should be optimally utilized. To this decision activity is 
needed to use suitable mathematical economic and statistic methods as instruments 
to decision-making support. 
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Subgroups of Abelian Group cre 


R. Ameri and A. Kialashaki 


Abstract The purpose of this paper is to accounting the number of fuzzy subgroups 
of an abelian group G = Zpx x Zp», where p is a prime number, and n, m are natural 
numbers, up to isomorphism. In this regards we use an special equivalence relation 
on fuzzy subgroups of G and chains of subgroups of G, which terminate in to G 
and apply this concise expression to classification of fuzzy subgroups of G. First, 
we solve the problem for arbitrary positive integer n and form = 1, 2,3, 4,5. Then 
based on the observation of the results and on the amount of data pertaining to this 
work, which has been checked by means of a computer programming, we conjecture 
a combinatorial formula for number of fuzzy subgroups of G and give the sketch of 
proof for the general case. 


Keywords Fuzzy group - Accounting + Equivalent - Chain - Large Schroeder 
numbers 


1 Introduction 


A fuzzy subset jz of a set X is a function from X to the unit closed interval [0, 1]. The 
concept of fuzzy subset of a set was initiated by Zadeh [21] in 1965. Zadeh’s ideas 
developed new directions for researchers worldwide. The concept of fuzzy subset of 
a set has a lot of applications in various fields like computer engineering, artificial 
intelligent, control engineering, operation research, management sciences and many 
more [10]. Lots of researches in this field show its importance and applications 
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in set theory, algebra, real analysis, measure theory and topological spaces [7, 9]. 
Rapid theoretical developments and practical applications based on the concept of a 
fuzzy subset in various fields are in progress. In 1971, Azraeil Rosenfeld introduced 
the notion of a fuzzy subgroup [17]. Also, Ameri et al. [4] applied fuzzy set to alge- 
braic hyperstructures. Another interesting approach deals with fuzzy hyperstructures. 
More details can be found in e.g. [3, 5, 6, 8]. 

One of the most important problem of fuzzy groups theory is to classify the fuzzy 
subgroups of a finite (abelian) group [15]. This topic has undergone a rapid evolution 
in the last few years. Several papers have treated the particular case of finite cyclic 
groups. In [12] the number of distinct fuzzy subgroups of a finite cyclic group of 
order p"q™, where P and q are distinct prime numbers and m, n are positive integer 
numbers was obtained, while [19] deal with this number for cyclic groups of order 
n = py p>’...p%, where pj, ..., Pn are distinct prime numbers. In [20] the number 
of fuzzy subgroups of finite cyclic groups was obtained. 

In this paper we follow to obtain the number of fuzzy subgroups of the abelian 
groups of the form G = Zp» x Zp, for the prime number p and for positive integers 
m,n obtained, also in [16] studies of equivalent fuzzy subgroups of finite abelian 
p-groups of rank two and their subgroup lattices was studied. In this regards, we rep- 
resent the number of sets of chains of subgroups and consider an equivalence relation 
on fuzzy subgroups, to enumerate the fuzzy subgroups of G (for more details see 
[9]). In this regards, first we solve the problem for an arbitrary n andm = 1, 2,3, 4,5 
and then based on the observation of the results and on the amount of data pertaining 
to this work, which has been checked by means of a computer programming, we con- 
jecture that the general formula for the general case. In Sect. 2, we present some basic 
notions and results of fuzzy subgroup, that we need to development of our paper. 
In Sect.3, we enumerate fuzzy subgroups, by equivalent relation ~, finite abelian 
P—groups of rank two, where P is any prime number. We use the results on the num- 
ber set of chains of subgroups of G which end in G to accounting the fuzzy subgroups 
of G, for any arbitrary positive integer n, and m = 1, 2. Finally in Sect.4, we are 
continuing the method in Sect. 3, and solve the problem for m = 3, 4, 5, and, conjec- 
ture that the general formula for the number of fuzzy subgroups of G = Zp» x Zpm, 
based on the observation of the results and on the amount of data pertaining to this 
work, also, we conjecture a combinatorial formula for number of fuzzy subgroups 
ofG and give and give an sketch of proof for the general case. 


2 Preliminaries 


First of all, we present some basic notions and results of fuzzy subgroups, which we 
need to development of our paper (for more details see [1, 2, 14, 16]). Let (G, .,e) 
be a group with identity element e; and jz: G-— [0, 1] be a fuzzy subset of G. 
We say that jz is a fuzzy subgroup of G if satisfies the following conditions: 


(a) w(xy) > min{u(x), 4(y)}, for all x, y in G; 
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(b) w(x—!) = w(x), for any x € G. 
In this situation we have .(x~!) = w(x), for any x € G,and w(e) = max (G). 
For any a € [0, 1], we define the level subset of a fuzzy subset yz as follows: 


Gu = {x € Glu(a) > a}. 


These subsets allow us to characterize the fuzzy subgroups of a given group G, 
in the following manner: 


Theorem 2.1 A fuzzy subset js of a group G is a fuzzy subgroups of G if and only if 
every nonempty level subset of jx is a subgroup of G. 


Fuzzy subgroups of a group G can be classified up to some natural equivalence 
relations on the set consisting of all fuzzy subsets of G. One of this equivalence 
relation is as follows(for more details see [11, 19, 20]): 


e~n & (UX) > UC) > 7X) > ny), Vx, y) € G, 


and two fuzzy subgroups yz and 7 of G are said to be distinct if 4. ~ n. This equiv- 
alence relation was generalized in [12, 13]. It is also closely connected to the con- 
cept of level subgroups. In this way, suppose that that the group G is finite and let 
u:G -—-> [0,1] be a fuzzy subgroup of G. Letting u(G) = {a1, a2, ..., a,} and 
assume thata, > a2 >... > a,. Then yz determines the following chain of subgroups 
of G, which ends to G: 


uGa, Cu Ga Cw» Cu Ga, =G  (*) 
Moreover, for any x € G andi = 1,2, ..., 7, we have 
w(x) =a; Si = max{j|x €, Go, \ p~Go_}, 


where, by convention, we set ,,Gy, = U. 

A necessary and sufficient condition for two fuzzy subgroups 4, 7 of G to be 
equivalent with respect to ~ has been studied in [11]. 

ju ~ 7 if and only if jz and 7 have the same sets of level subgroups, that is they 
determine the same chains of subgroups of type (*). This result shows that there 
exists a bijection between the equivalence classes of fuzzy subgroups of G and the 
set of all chains of subgroups of G which end to G. So, the problem of counting all 
distinct fuzzy subgroups of G can be translated into a combinatorial problems on 
L(G), the lattice of all subgroups of G: finding the number of all chains of subgroups 
of G that terminate in G. 


502 R. Ameri and A. Kialashaki 


3 Fuzzy Subgroups Enumeration 


In this section we enumerate fuzzy subgroups, by equivalent relation ~, of finite 
abelian p—groups of rank 2, where p is any prime number. We use the results on the 
number set of chain of subgroups of G, which ends to G. 


3.1 Counting Fuzzy Subgroups of Zp» x Zp 


Lemma 3.1 Abelian group G = Z, x Z, has 2p + 4 distinct fuzzy subgroups. 


Proof Since the number of fuzzy subgroups of G is equal to the number of the set 
of chains of subgroup of G which ending to G, hence, it is easy to see that there are 
2p +4 chains of subgroups. Total number chains we represent by table below: 


(0,1)cG, (1,0) CG,..,0, p-I CG, 
(0,0)c (0,1) CG, 
OO d.0.e G4.00) Siete & 


and 
(0,0) Cc G, G. 


Therefore, the number of distinct fuzzy subgroups of G is equal to 2p + 4, as 
desired (Fig. 1). 


Similarity, we can determine distinct fuzzy subgroups of Z,> x Z,, by considering 
the chains of which is equivalent to chains of table below (Fig. 2). 

Consider Z,2 x Zp as an extension of subgroups Z, x Z, and Z,, now two set 
of chains of subgroups Z,2 x Zp, are as follows: 


Zy XZ, CZ x Zp, 


and 
0.49 eee 11 026 0,1 — 0 1,1 0 
— ee 
App app 
“1.2 x / 1,29 vA 
aed a1) ae a) 
0,0>)_— <0,0> <1,0> 


Fig. 1 The chains of subgroups Zp x Zp 
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«0,401 po» ——__ ZxZ 


3 
eee 
?.p-? az, <1,p-1? a 


i ‘2 Pi “4.2 
ee nila re rom 


<0,0> ?.o ———\M  — 10 


Fig. 2 The chains of subgroups Z,,2 x Zp 


Zp x {0} c (b) Cc Ly x Zp, 


where (b) has p subgroups generators of the form (0, 1), (1, 1),....,d, p— 1). 
Hence, 
Zp x {0} C (b) C Zp2 x Zp, 


contributes 2 x 2 x p chains of subgroups of Zp2 x Z,, which end in Z,» x Zp, 
and 


Ly X Zy C Zp x Zp, 
has 2 x (2p + 4) chains of subgroups of group 
Zp. Xx Lp, Zp2 X Zp. 
Thus Zp2 x Zp has 2(2p + 4) + 2(p”) = 8p +8 = 2? x p +23, distinct fuzzy 


subgroups. Similarity, we can determine for n = 3 the number of distinct fuzzy 
subgroups of Z,; x Z,, where it is equal to: 


24p +16 = 2? x C3, Ip +29”, 


where C(n, m) = n!/(n — m)!m!. 


Theorem 3.2 The p-group Z» x Z, has 2” x c(n, 1) x px c(n, 1) x p+ 2") 
distinct fuzzy subgroups. 


Proof We prove by induction on n, where n formula yields 2p + 4 distinct fuzzy 
subgroups in agreement with Lemma 3.1 . We assume that result is valid for a positive 
integer k, that is 

G=Z) x Zp, 


contains 2*C(k, 1)p + 2+” fuzzy subgroups. 
Now Zp«+1 x Zp has two sets of chains, which are as f 


Zipk x Zp Cc Z p+ x Zp; 
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0,4, ———————01, p» ———————— 1p» + +, 1p) ——__———. G 


oo a ae io 
bo / / a [eed 


0. <9.» =—<———____p tone 0) 1,0 


Fig. 3. The chains of subgroups of Zpn x Zp 


and 
Zi pk x {0} Cc (b) Zp x Zps 


where (b) has subgroups generators of the form (1, 0), (1, 1), ..., 1, p — 1) as illus- 
trated in the following diagram. Each chain of subgroups of 


Zipk x Zp C Zpus X Zp; 


has 2[2*C(k, 1)p+2**! chains of subgroups of Z pi x Zp, which end to 
Zi pk+ x Zp- 

Finally, we consider the extension of Z,« x {0} = Z,«. The group of Zp« has 2§ 
chains of subgroups of Z,«, which end of Z,«. Each chains of subgroups in extension 
of Zp« yield 2p chains of chains of subgroups of Zpu+» x Zp. Hence, it result is 
2p(2*)P. 


Hence, total number of chains is equal to 2[2* C(k, 1) pt 2&+)D]_ where is equal 


to24OCKk + 1p +26, 
which is equal to the number of distinct fuzzy subgroups of Zpu+n x Zp (Fig. 3). 


3.2 Counting Fuzzy Subgroups of Zp x Z,2 


Lemma 3.3 The group G=Z, x Z,» has 8p? + 24p + 16 distinct fuzzy sub- 
group. 
Proof Since Z,2 x Z,2 is a non-cyclic group of order p*, it has subgroups of order 


k, where k is a divisor of p+. Hence, every maximal chain of Z p? X Zp is a chain of 
the form: 


{0} C (a) C (b) C (c) CG, 


where, a,b,c are elements of G of orders p, p*, p*, respectively. Consider (a) 
as a representative of the subgroup generators of the forms Op; p(ip), for i = 
1, 2,3, ..., p — 1, whose intersection contains the identity alone. It is easy to ver- 
ify that there are only p + 1 such generators. The subgroup generators of the form 
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(b) fall into different slots, as shown on the table below. Again it is not difficult 
to see that there are p + 1 such slots each with a membership of p + 1. It should 
be noted that in this count one group, namely one generated by (Op, p0) has been 
counted more than once. the equation (p + 1)(p+ 1) — p= p?+ p+ 1 fixed the 
over count. This means that (b) is comprised of p? + p cyclic and non-cyclic sub- 
groups. The intersection, 1(b)\(Op, pO), is empty. With exception of the non-cyclic 
(Op, pO), which contains all the subgroups (a) and itself contained in all (c), the 
rest of subgroups in each slot each contained in only one subgroup (c), respectively. 
By counting, there are p + 1 subgroup generators comprising (c), which is itself 
contained in G. Alternatively, the chains of the subgroups of G, which end to G can 
be viewed as extensions of chains of the previous groups in this manner. The set 
of chain of G are extensions of Z,2 x ZporZ, x ZporZ,. Also, Zp? x Zp? x Zp 
contributes 8p + 8 sets of chains of subgroups xZ,, which terminate in Z,2 x Zp. 
There are chains the go through (Op, p0) is 2(8p + 8). Next we consider the chains 
the go through (Op, p0), as such chains are the extensions Z,, x Z,. In this effect 
we obtain 2p ways to do this, with the path (Op, 10) having been considered. Hence 
we have 2 p(p + 1) such extensions. Finally in extension Z, contributes 4 p> chains 
of subgroups of G which terminate in G not having been considered before, as can 
be checked. Thus the total number of fuzzy subgroups of G is equal to 


2(8p + 8) + 2p(p + 1) + 2p? = Bp* + 24p + 16. 


Theorem 3.4 The number of distinct fuzzy subgroups of abelian group G = Zp» x 
Zp2 is equal to 


2"[C(n, 1) + 2C(n — 1, 1) ] pp? +2 PCr +1, Dp + 2°. 


Proof We prove by induction on n. Let C(a, b) = 0 for a < b. For n = 2 8p? + 
24p + 16 chains of subgroups of Z,> x Z,2, in agreement with Lemma 3.3. 


Assume that G = Z,« x Z,2 has 2*[C(k, 2) + 2C(n — 1, 1)]p? + 2“°9C(K + 
1, 1)p + 2+ chains of subgroups of Z pt X Zy2. We must to show that the result 
is true for N = k + 1. Every chain of subgroups Z «+» x Z,2 can be considered as 
an extension of subgroups of Zp: x Z, or Z,« x Z, or Zpx. We first determine the 
number of Z, x Z,2, which are extensions of Z,« x Z,2. By assumption one has 


(Cte 2) 4-260 = 1, ip + 2" CEA, Dot 2, 


such only two chains through the generator. Hence the numbers of chains of Z x11 x 
Zp? is equal to 


2{2*[C(k, 2) + 2C(k — 1, 1)]p? +29 CK +1, Dp +27}. 


We next consider those which are extensions to Z,« x Z,. Clearly, these are the 
chains generates in extension of Z,« x Zp to Zp+1 x Zp is equal to 2p. Hence, the 
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a 


0 1 

(a) = Op: p(ap) p+1 a=0,1,...,p—1 
(b) = O1; (Op,p0): pp B=1,...,p—1 
(6) = (Op,p0);  1(0+ap) 

(b) = (Op,p0); 1(1+ap) 

: p?>+p+l1 |a=0,1,..., p-l 
(b) = (Op,p0); 1((p—1)+ ap) 

(c) = (01,p0); (Op,p0); (16,1(5+p)) | p+1 §=1,...,p—1 


Fig. 4 The table of subgroups of Zp x Z,2 


number of chains of subgroups of Z,i+1 x Z,2 in the extension subgroups Z,« x 
Zp is Zp x Zp is equal to sp[2*C(k, 1)p + 2**! (by Theorem 3.2). Consider the 
extensions of Z,«. The number of chains of subgroups Z« is equal to 2*. Similarity, 
by Lemma 3.3, the number of chains of subgroups of Z,+1 x Z,2 in extension Zp« is 
equal to 2‘(4 x p x p). Hence, the total number of chains of subgroups Z pki X Ly 
is equal to 


2{2*(C(k, 2) + 2C(k — 1, 1))p? +21 CK +1, 1)p + 2*47} 4+ 
2pl2*C(k, Dp + 21] + 2*(4.p.p) = 
2 Ch +1, 2) + 20h 4 1=1, Dp +2"? CR 4141, Hp, 


This completes the proof. The table of subgroups of G has been illustrated in Fig. 4 
in the following. 


4 Counting Distinct Fuzzy Subgroups of G = Zp" x Zpm 


We state the following results that pertain to every positive integer n and for 3 < 
m <5. The results can be proved with the similar argument as in Theorem 3.4. 


Theorem 4.1 The number of fuzzy subgroups of group G = Zn x Zp: is equal to 


2"[C(n, 3) + 4C(n — 1,2) + 6C(H — 2, I)]p* + 2"""'[C(a + 1,2) + 2C(@, DI p? 
As 220 4 2, 1)p ae grt. 
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Theorem 4.2. The number of fuzzy subgroups of group G = Zn x Zps is equal to 


2"[C(n, 4) + 6C(n — 1, 3) + 16C(n — 2, 2) + 2C(n — 3, 1)|p* + 
PCH +1,3)+4C(, 2) +6C(n —- 1, D]p? + 
2"1C(n +. 2,2) +2C(n 41, D]p? + 2° Cm 4+ 3, Dp +2". 


Theorem 4.3 The number of distinct fuzzy subgroups of abelian group G = Zp x 
Zp is 

2"[C(n, 5) + 8C(n — 1, 4) + 30C(n — 2, 3) + 68C(n — 3, 2) + 90C(n — 4, 1) ]p? 

2’ Cm + 1,4) + 6C(n, 3) + 16C(m — 1, 2) + 2C(n — 2, 1) p* 

2"*71C(n + 2,3) +4C(n + 1, 2) 

6C(n, 1)]p?2"*3[C(n + 3, 2) + 2C(n + 2, I] p* + 2"*4C +4, Dp +2". 


Conjecture: According to many consideration of examples and a numerous of data 
pertaining to this work, which has been checked by means of a computer program- 
ming, we conjecture that the general formula for the number of fuzzy subgroups of 
abelian group Zp» x Zym should be equal to 


m—1 


D5 2"1C(a = j,m— fl * Tn fp p™ + pat Dp(m— 1, 
j=0 


with p(n + 1)p(0 — 1) = 2"*""+!, where T,, is the mth row of a Schroeder’s triangle 
(for more details see [18], A033877). Note that p(n + 1) p(m — 1) is the number of 
fuzzy subgroups of 

Z prvi x Z pm-\ ri 
and * the usual pointwise multiplication. The array (A033877) is described mathe- 
matically in the following way: 

T(n,k) =0, if k<n and Tin,k) =Ttn,k-1)+Tm—-1,k-)D+TO- 
1, kK), in the otherwise, and T(1, 0) = 1. 

For the first few rows, the array reads as follows (Fig. 5). 

This array is associated with several sequences, [20]. For instance a sequence 
obtained by considering the last entry to the right in each row, namely 1, 2, 6, 22, 90, 
394, ... is generally called the Large Schroeder Numbers (LSN). Also, the sum of 
the entries in each row (row sums) generates a sequence which grows rapidly in 
the values of its terms. This sequence is commonly called Schroeder’s second prob- 
lem (generalized parenthesis) or sometimes called Super-Catalan Numbers ox little 
Schroeder numbers. There are various other interpretations of these types of num- 
bers. H. Bottomley gave a graphical illustration of the initial terms ([18], A033877). 
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Fig. 
Numbers (LSN) 
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5 Large Schroeder 1 


1 6 16 22 


1 8 30 68 90 
1 10 48 146 304 394 
1 12 70 264 714 1412 1806 


The other sequence associated with this array is called Royal paths in a lattice and 
consists of the terms 1, 4, 16, 68, 304, 1412, .... Numerous works have been done 
around this array by Schroeder himself and various other mathematicians. All this 
can be found by visiting the site [18]. 
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